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^^ ■ For many random constraint satisfaction problems such as random satisfiability or random graph or 

hypergraph coloring, the best current estimates of the threshold for the existence of solutions are based 

on the first and the second moment method. However, in most cases these techniques do not yield 

matching upper and lower bounds. Sophisticated but non-rigorous arguments from statistical mechanics 

have ascribed this discrepancy to the existence of a phase transition called condensation that occurs 

shortly before the actual threshold for the existence of solutions and that affects the combinatorial nature 

\^ • of the problem, rendering the second moment method powerless (Krzakala, Montanari, Ricci-Tersenghi, 

(-H \ Semerjian, Zdeborova: PNAS 2007). In this paper we prove for the first time that a condensation 

transition exists in a natural random CSP, namely in random hypergraph 2-coloring. Perhaps surprisingly, 
we find that the second moment method breaks down strictly before the condensation transition. Our 
proof also yields slightly improved bounds on the threshold for random hypergraph 2-colorability. We 
expect that our techniques can be extended to other, related problems such as random fc-SAT or random 

^Sl ■ graph fc-coloring. 
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1 Introduction and results 

For many random constraint satisfaction problems such as random fc-S AT, random graph coloring, or random 
hypergraph coloring the best current bounds on the thresholds for the existence of solutions derive from the 
first and the second moment method. However, in most cases these simple techniques do not yield matching 
upper and lower bounds. In effect, for most random CSPs the precise threshold for the existence of solutions 
remains unknown. Examples of this include random fc-SAT, random graph fc-coloring, and the 2-coloring 
problem in random A;-uniform hypergraphs {k > 3). 

In this paper we investigate the origin of this discrepancy with the example of the random hypergraph 2- 
coloring problem, in which the second moment analysis is technically relatively simple. First, we present an 
approach to improve slightly over the naive second moment argument. But more importantly, we establish 
the existence of a further phase transition below the threshold for the existence of solutions. At this so-called 
condensation transition, whose existence was predicted on grounds of sophisticated but non-rigorous sta- 
tistical mechanics arguments El [HI, the combinatorial nature of a 'typical' solution becomes significantly 
more complicated. Arguably, beyond the condensation transition it is conceptually more difficult to prove 
that solutions exist, and indeed in several random CSPs condensation seems to pose the key obstacle to 
determining the precise threshold for the existence of solutions. Here we prove rigorously for the first time 
that a condensation transition indeed exists. 

To define the random hypergraph 2-coloring problem, let F = {1, . . . , n} be a set of vertices, let A; > 3, 
and let Hk{n, m) be a random fe-uniform hypergraph on V obtained by inserting a random set of m edges 
out of the (^) possible edges. A 2-coloring of i? is a map cj : y — )• {0, 1} such that no edge e of -ff is 
monochromatic. Throughout the paper, we will let r = m/n denote the density of the random hypergraph. 
An event £ occurs with high probability ('w.h.p.') if its probability tends to one as n — )■ oo. We let S{H) 
denote the set of all 2-colorings of the hypergraph H, and we let Z{H) = \S{H)\. 

Friedgut's sharp threshold theorem implies that for any A; > 3 there exists a threshold Vcoi = rcoi{k, n) 
such that for any e > the random hypergraph Hk{n, m) of density r = m/n < (1 — e)rcoi is 2-colorable 
w.h.p., while for r > (1 + e)rcoi it is w.h.p. not HHO. u Although the precise threshold rcoi is not known 
for any A; > 3, the first and the second moment methods can be used to derive upper and lower bounds. To 
put our results in perspective, let us briefly recap these techniques. 

The first and the second moment method. The first moment method yields an upper bound on r^oi- More 
precisely, by Markov's inequality, 

P [Hk{n, m) is 2-colorable] = P [Z > 1] < E [Z] . 

Hence, if for some density r the first moment E [Z] satisfies E [Z] = o(l), then r^oi < r (for large 
enough n). Indeed, it is easy to compute E [Z] explicitly, and to verify that there is a critical density 
rji^.t = 2^-1 ln(2) - ln(2)/2 + Ofc(l) such that E [Z] = exp(J^(7i)) > 1 if r < r first while E [Z] = o(l) 
if r > r first- Hence, rcoi < r first- 

Even though E [Z] = exp(0(n)) is exponentially large in n for r < r first, this does, of course, not imply 
that Hk{n, m) is 2-colorable with high probability: it could simply be that a tiny number of hypergraphs 
drive up the expected number of 2-colorings because they possess excessively many of them. The purpose 
of the second moment method is to rule this possibility out. More precisely, the second moment argument 
is based on the Paley-Zygmund inequality 

F \Z]^ 
P [Hdn, m) is 2-colorablel = P [Z > 0] > I, . 

E[Z^J 



'it is widely conjectured tliat limrn-oo rcoi{k, n) exists for any fc > 3. Hence we will take the liberty of just speaking of 'the 
threshold Vcoi' (for A; > 3 given). 



Hence, if for some density r < r first we can show that 

E [Z^] < C • E [Zf (1) 

with C = C{k, r) > independent of ?i, then P [Hk{n, m) is 2-colorable] > 1/C. That is, the probability 
of 2-colorability is bounded away from as n — )• oo. Therefore, the sharp threshold theorem implies that 
fcoi ^ 1"- Indeed, Achlioptas and Moore |[3l proved that there is a critical density r second = '^~^ In 2 — 
(1 + In 2)/2 + Ofc(l) such that ([T]) holds for all r < rsecond but is violated for r > r second- In summary, the 
first/second moment ai^guments yield the bounds 

rsecond = 2^'^' ln2 - ^^^ + Ofc(l) < Teal < Tfirst = 2^'^ In 2 - ^ + Ofc(l). (2) 

Approaching the condensation threshold. How could we improve the lower bound on Vcop. The second 
moment analysis in |13J is tight, and thus simply performing a better calculation will not suffice. Indeed, as 
observed in [3], for r > rsecond we have E [Z^] > exp(f](n)) • E [Z] , i.e., the second moment method fails 
dramatically. But why? One possibility could be that the expectation E [Z] is driven up by a tiny minority 
of hypergraphs with excessively many 2-colorings, i.e., that Z < exp(— il(n))E [Z] w.h.p. In this case (d) 
would fail to hold because the second moment E [Z^] would exacerbate the contribution of the few 'rich' 
hypergraphs even more than the first moment. A second possibility is that Z is 'close' to E [Z] w.h.p., but 
without being sufficiently concentrated for ([T]) to hold. The following theorem, which improves the lower 
bound in ^ by an additive (1 — ln(2))/2 ^ 0.153, shows that up to rcond = 2^~^ In 2 — In 2 > rsecond, the 
second scenario is true. 

Theorem 1.1 There is a constant k^ > 3 such that for all k > k^ and r < Veond t^^ random hypergraph 
Hk{n, m) is 2-colorable w.h.p. and 

lnZ~lnE[Z] w.h.p. (3) 

For r < r^ond the expected number E [Z] of 2-colorings is exponentially large in n. Hence, Q shows that 
for r < rcond w.h.p. Z is exponentially large as it coincides with E [Z] up to sub-exponential terms. 

The proof of Theorem fTTl is based on an enhanced second moment argument that takes the 'geometry' of 
the set S{Hk{n, m)) of 2-colorings of the random hypergraph into account. As a corollary of this argument, 
we obtain a result on the 'shape' of this set, viewed as a subset of the n-dimensional Hamming cube {0, 1}" 
equipped with the Hamming distance. To state this result, let us say that a 2-coloring a of a hypergraph H 
on n vertices is (q, j3, 'y)-shattered for a, 7 > and /3 > a if the following is true. 

SHI. There is no 2-coloring r G S{H) with an < dist(o", r) < /3n. 

SH2. The set Cq (o") of all 2-colorings r G 5(//) with dist (cr,r) < onhas size |Ca(cr)| < exp{—'yn)Z{H). 

Intuitively, this means that a is part of a 'cluster' Ca((7) of 2-colorings, whose size is exponentially smaller 
than the total number Z{H) of 2-colorings. Furthermore, there is a 'gap' of size (/3 — a)?i between this 
cluster and the remaining 2-colorings of H. 

Corollary 1.2 There is a constant kQ > 3 such that for any k > ko there is 7^ > such that for r < rcond 
all 2-colorings of the random hypergraph Hk{n, m) are (0.01, 0.49, ^k)-^hattered w.h.p. 



\N 



Corollary [L2] implies that w.h.p. the set of 2-colorings of H = Hk{n, m) has a decomposition S{H) 



|Ji=i Si into subsets that each comprise only an exponentially small fraction of all 2-colorings and that are 
mutually at Hamming distance at least 0.48n. (Indeed, inductively choose Si to be the local cluster Cq.oiIc) 



of some 2-coloring a U^i 5*^.) This decomposition allows us to explain intuitively why the 'vanilla' 

second moment argument fails for r second < r < rcond- In fact, we can write Z{H)'^ = ^^ ■ ^ \Si\ ■ \Sj\. 
To estimate the expectation of this quantity, we need to bound on the number A^ of components and their 
sizes \Si\. As we will see in Section IH the naive second moment argument overestimates the 'cluster sizes' 
\Si\ grossly. We overcome this problem by investigating the internal structure of the 'clusters' Si. We expect 
that this approach extends to other problems such as random k-SAI or random graph A;-coloring, although 
the technical details will be far more intricate. 

Into the condensation phase. As we will see next, even the enhanced second moment ai^gument from 
Theorem 1 1.1 1 does not give the precise threshold for 2-colorability. The intuitive reason is that for densities 
beyond r^ond^ the expected number E [Z] of 2-colorings is indeed driven up excessively by a tiny minority 
of hypergraphs with an abundance of 2-colorings. 

Theorem 1.3 There exist a constant ko > 2> and a sequence e^ — )■ such that for any k > ko there are 
6k > 0, Ck > such that the following two statements are true. 

1. W.h.p. Hk{n, m) is 2-colorable for all r < r^ond + Efc + 6k- 

2. For any density r with rcond + Sk < r < Vcoi we have 

In Z < In E [Z] - Ckn w.h.p. (4) 

The second statement asserts that for densities between rcond + ^k and the actual (unknown) 2-colorability 
threshold TcoI, the expected number E [Z] of 2-colorings exceeds the actual number Z by an exponential 
factor exp(Cfcn) w.h.p. This contrasts with Theorem ll.il which shows that below rcond, Z is of the same 
exponential order as E [Z] w.h.p. Furthermore, the first part of Theorem 11.31 ensures that the regime of 
densities where (01) holds is non-empty, as the true threshold TcoI is indeed strictly greater than Vcond + Cfc- 
This so-called condensation transition at density rccmd = 2^~^ In 2 — In 2 was predicted on the basis of 
non-rigorous statistical mechanics arguments |[9l[T6l. 

In mathematical physics, the term 'phase transition' is usually defined as a point where the function 
F{r) = lim„^oo -E [ln(l + Z)] is non-analytic. However, it is not currently known if the limit F{r) 
exists. (Bayati, Gamarnik and Tetali Q proved that for any density r, the coiTcsponding limit of the partition 
function at any fixed positive temperature exists.) It is not difficult to see that Theorems 11.11 and 11.31 imply 
that around r = rcand, the function F{r) in fact is non-analytic if the limit exists (because for r < rcond, 
F{r) coincides with the linear function limn_>.oo - InE [Z]). 

The term 'condensation' is meant to express that w.h.p. the set S{Hk{n,m)) of all 2-colorings has a 
drastically different shape than in the 'shattered' regime of Corollary 11.21 To express this, let us call a 
2-coloring of a hypergraph H onn vertices (a, /3, 'y)-condensed if 

COl. There is no 2-coloring r G S{H) with an < dist((7, r) < /3n. 

C02. ThesetCQ((T) of all 2-colorings T G cS(i7) withdist((T, r) < onhas size |Ca((T)| > eyip{—'yn)Z{H). 

(The difference between SH1-SH2 and the above is that C02 imposes a lower bound on |CQ(cr)|.) 

Corollary 1.4 There exist a constant /cq > 3 and a sequence e^ — )■ such that for any k > k^ there exist 
a sequence r{n) of densities satisfying \r{n) — rcond\ ^ ^k such that Hk{n, m) with m = r{n) ■ n has the 
following two properties w.h.p. 

1. Hk{n, m) is 2-colorable. 

2. A random 2-coloring a G S{Hk{n, m)) is (0.01, 0.49, o{l))-condensed w.h.p. 



This means that at a particular density r(?i), i.e., right at the condensation transition, the size of the 
local cluster of a 'typical' 2-coloring a of Hk{n,m) satisfies In |Co.oi(c)| ~ InZ w.h.p. In other words, 
the size of the cluster of a 'typical' 2-coloring has the same exponential order as the set of all 2-colorings. 
This contrasts with the 'shattered' scenaiio of CoroUaiy II. 2[ where w.h.p. all clusters only comprise an 
exponentially small fraction of the entire set S{Hk{n, m)). The statistical physics work ||9l[T6l suggests 
that indeed, the conclusions of Corollary 11.41 hold in the entire regime between the condensation transition 
and the 2-colorability threshold. 

Discussion. The significance of the slightly better lower bound on the threshold for hypergraph 2-colorability 
provided by Theorem 1 1.1 1 is that it allows us to prove the existence of the condensation transition. Beyond 
the condensation transition, the combinatorial nature of the problem becomes far more complicated. To see 
why, consider the following random experiment with r < r^oi (so that Hk{n, m) is 2-colorable w.h.p.). 

Gl. Choose a random hypergraph H = Hj^{n, m), conditional on H being 2-colorable. 

G2. Choose a 2-coloring a G S{H) uniformly at random and output {H, a). 

The above experiment induces a probability distribution gk.n,m on the set A^ (n, m) of hypergraph/2-coloring 
pairs that we call the Gibbs distribution. For r < Vcoi the experiment corresponds to sampUng a random 
2-coloring of a random hypergraph, and thus understanding the above experiment is the key to studying the 
combinatorial nature of the hypergraph 2-colorability problem. But the experiment seems genuinely difficult 
to analyze. In fact, even for densities r = 0{2^'^ /k) far below the threshold for 2-colorability, it is not 
known how to efficiently construct, let alone sample, a 2-coloring of a random hypergraph 121 . 

But there is a related experiment called the planted model that is rather easy to implement and to study. 

PI. Choose a € {0, 1}" uniformly at random. 

P2. Choose a hypergraph H = Hk{n, m, a) with m edges uniformly at random among all hypergraphs for 
which cr is a proper 2-coloring, and output {H, a). 

Let Pk,n,m denote the distribution on Afc(n, m) induced by P1-P2. It is not difficult to show that prior to the 
condensation phase, the distributions induced by the two experiments are 'close'. 

Proposition 1.5 (111) Suppose that r < r first '^ such that InZ ~ InE [Z] w.h.p. Then 

Hgk^r^,m [B\ {InZ ~ InE [Z]}]) < ln(pfc,„,^ [B]) + o{n) for any event B^%. (5) 

The relationship ^ allows us to bound the probability of some 'bad' event B in the Gibbs distribution by 
estimating its probability in the planted distribution. Indeed, Proposition 1 1 . 5 1 was used in lH to study vaiious 
properties of 'typical' 2-colorings of Hk{n, m). In combination with Theorem II. H and the methods of [H. 
Proposition [T3] can be used to get a pretty good idea what a 2-coloring of the random hypergraph Hj^{n, m) 
"typically looks like" before the condensation transition. 

But beyond the condensation transition, all bets are off. As Theorem 11.31 shows, in the condensed 
regime we have InZ < InE [Z] — Q.{n) w.h.p., i.e., the assumption of Proposition [T3] is violated. Roughly 
speaking, the gap InZ < InE [Z] — Q.{n) implies that a pair chosen from the planted distribution P1-P2 
con^esponds to a pair chosen from the Gibbs distribution only with exponentially small probability. In fact, 
for densities beyond the condensation transition our proof of Theorem 11.31 exhibits an event B for which ^ 
is violated, i.e., the planted model is no longer a good approximation to the Gibbs distribution. Furthermore, 
the statistical mechanics cavity technique suggests that getting a handle on the Gibbs measure (or other 
related measures) is far more complicated in the condensation phase. Overcoming this obstacle appears to 
be the remaining challenge to obtaining the precise threshold for hypergraph 2-colorability. The statistical 
mechanics reasoning ^ [161 suggests 



Conjecture 1.6 There is £k ^ ^ such that Vcoi ~ 2^^ ^ ln2 - {^ + j) + Sk- 

One limitation of our approach is that we need to assume that A; > A;o is sufficiently big (whereas the 
standard second moment ai^gument |l3l applies to any k > 3). We need the lower bound on k to carry out a 
sufficiently accurate analysis of combinatorial structure of the solution space S{Hk{n, m,)). No attempt has 
been made to compute (let alone optimize) ko or the various other constants. 

2 Related work 

The two inequalities in Q state the best previous bounds on the threshold for hypergraph 2-colorability 
from the paper of Achlioptas and Moore |i3j|, which provided the prototype for the second moment analyses 
in other sparse random CSPs (e.g., ||4l|5l). Since the second moment method is non-constructive, there is the 
separate algorithmic question: for what densities can a 2-coloring of a random hypergraph be constructed 
in polynomial time w.h.p.? The best cun^ent algorithm is known to succeed up to r = c • 2''"^ /k for some 
constant c > 0, i.e., up to a factor of about k below the 2-colorability threshold 0. 

In m the geometry of the set S{Hk{n, m)) of 2-colorings of the random hypergraph was investigated 
(among other things). It was shown that S{Hk{n,m)) shatters into exponentially small well-sepai^ated 
'clusters' for densities (1 + ek)2^~^ln{k)/k < r < rgecond- Corollary 11.21 extends this picture up to 
r < Tcond- In addition, lUl also proved that in the regime (1 + ek)2^~'^ hi{k)/k < r < rgecond a typical 2- 
coloring a of Hk{n, m) is rigid w.h.p. in the sense that for most vertices v any 2-coloring r with a{v) ^ t{v) 
has Hamming distance Q,{n) from a. Our analysis, most notably the study of the structure of a typical 'local 
cluster' in Section |5l builds substantially on the concepts of shattering and rigidity from lH, but we will 
have to to elaborate them in considerably more detail to get close quantitative estimates. 

In many random CSPs other than random hypergraph 2-coloring the best current bounds on the thresh- 
olds for the existence of solutions derive from the second moment method as well. The most prominent 
examples are random graph A;-coloring JH and random /c-SAT ||5l. But the second moment argument ex- 
tends naturally to a range of 'symmetric' random CSPs ifTTl . It would be interesting to see if/how our 
techniques can be generalized in order to prove the existence of a condensation transition in these other 
problems, particularly random graph /c-coloring. However, since even the standard second moment anal- 
ysis is quite involved in this case of random graph A:-coloring, such a generalization will be technically 
challenging. 

The random fc-SAT problem is conceptually different because it is not 'symmetric'. More precisely, in 
random hypergraph 2-coloring the inverse 1 — u of a 2-coloring o" is a 2-coloring as well. This symmetry, 
which greatly simplifies the second moment argument, is absent in random /c-SAT. As a consequence, as 
elaborated in 111 [51, in A;-SAT the bound E [Z^] = 0{E[Zf) does not hold for any density. Roughly 
speaking, to overcome this problem IH focuses on a special type of satisfying assignments ("balanced" 
ones), whose number Z* satisfies E [Z^] = 0(E [Z^,] ). Technically, this is accomplished by weighting 
satisfying assignments cleverly. While our techniques can be extended easily to establish the existence of 
a condensation transition for these balanced satisfying assignments in random A;-SAT, this does not imply 
that condensation occurs with respect to the bigger set of all satisfying assignments. This would require a 
new approach for the direct analysis of the total number of satisfying assignments in random A;-S AT. 

We emphasize that our techniques are quite different from the 'weighted' second moment method in IH. 
Indeed, the 'asymmetry' that motivated the weighting scheme in [SJ is absent in random hypergraph 2- 
coloring. Instead of weighting, we employ a new idea that exploits the combinatorial structure of the 'clus- 
ters' into which the set S{Hk{n, m)) of 2-colorings decomposes. 

An example of a random CSP in which the precise threshold for the existence of solutions is known is 
random A;-XORSAT. In this problem a second moment argument yields the precise thresholds (after 'prun- 
ing' the underlying hypergraph) |[T0l[T9l . The explanation for this success is that random fc-XORSAT does 



not have a condensation phase due to the algebraic nature of the problem. Similarly, in random k-SAI with 
k > log2 f^ (i.e., the clause length is growing with n) there is no condensation phase and, in effect, the 
second moment method yields the precise satisfiability threshold HI [131. A further class of problems where 
the condensed phase is conjectured to be empty are the 'locked' problems of |[22l . 

In statistical mechanics the condensation transition was first predicted (using non-rigorous techniques) 
for the random A;-SAT and the random graph /c-coloring problems |fT6l. For random hypergraph 2-coloring 
the statistical mechanics prediction for the condensation threshold was derived in ^. The structure of the 
condensed phase is described using a non-rigorous framework called one-step replica symmetry breaking. 
Interestingly, it was also conjectured that the structure of the condensed phase for large k is very similar to 
the structure of the random subcube model ifTSl . Our proofs verify this for random hypergraph 2-coloring. 

Random CSPs, including random hypergraph 2-coloring, have been studied in statistical mechanics 
as models of disordered systems (such as glasses) under the name 'diluted mean field models'. In this 
context the condensation transition corresponds to the so-called Kauzmann transition |[T5l . The present 
paper provides the first rigorous proof that this phase transition actually exists in a 'diluted mean field 
model'. 

3 Preliminaries 

We need the following Chemoff bound on the tails of a binomially distributed random variable from |[T4l 

p. 21]. Let ip{x) = (1 + x) ln(l + x) - x. Ol p. 26] 

Lemma 3.1 Let X be a binomial random variable with mean fj, > 0. Then for any t > Owe have 

F[X >E[X]+t] < exp(-^-v3(t/^)), 
P[X <E[X]-t] < exp{-fi-ip{-t/n)). 

In particular, for any t > 1 we have P [X > t/.i] < exp [— t/-i ln(i/e)] . 

The following large deviations principle for the binomial distribution can be found, e.g., in |[T4l p. 27]. 

Lemma 3.2 Let X = Bin(n,p) be a binomial random variable with fi = np > 0. Let t be such that 
// -f t G {1, . . . , n — 1}. Then 

\nF[X=p + t] ~ -nip{t/n)-{n- fi)^{t/{n- fj.)). 

The following is a mild generalization of 'Laplace lemmas' statements in JSl fTOl . 

Lemma 3.3 Let ip £ C'^(0, 1) be such that linix^o 'tp{x) = linij:_!,i tjj{x) = 0. Assume that z G (0, 1) is 
the unique global maximum ofip, that ipiz) > 0, and that ip"{z) < 0. Then 

n-\ 

y ^exp('0((i/n)) < 0(\/n) exp(n'i/;(z)). 

Proof. Since V" G C''^(0, 1), Taylor's formula shows that 

^(z + ,5)-^(z) = y-V"(^) + 05(<52). (6) 



Moreover, as lim2,.^.o ip{x) = lim^:^.! ipix) = 1, for any fixed 5 > we have 

n-l 

^exp('(/'(d/n)) ~ ^ exp('0(d/n)). 

d=l (z-(5)n<d<(2+<5)n 

Suppose that {z — 6)n < d < {z + 6)n. Then Q imphes that for small enough S, 

exp [?7-^(d/?7-)] = exp [?7-^(z)] • exp 
< exp [n'0(2;)] • exp 



n(^(^-=)%0(W„-=rt 



ip"{z) {d — zn) 



21 



n 



(V) 



(8) 



n 



Combining (jT]) and ([H) yields the assertion. 
The following lemma is implicit in 111 . 

Lemma 3.4 For any e > and any k > 3 the following is true. Suppose that r < rcond- Then w.h.p. 
Hk{n., m) is such that 

In Z ~ E In [1 + Z]. 



4 The enhanced second moment argument 

In the rest of this paper, we assume that k > k^for some large enough constant /cq. Moreover, to avoid floor 
and ceiling signs, we assume that n is even. 

4.1 The local cluster and the demise of the vanilla second moment argument 

We begin by briefly reviewing the 'vanilla' second moment method from 131 . This will provide the back- 
ground for the enhanced the second moment ai^gument that yields Theorem ll.il As a first step, we need to 
work out the expected number E [Z] of 2-colorings. 



l~k\m. 



Lemma 4.1 We have E [Z] ~ 2"(1 - 2 

Proof. Any fixed a : F — )■ {0, 1} is a 2-coloring of Hk{n, m) iff Hk{n, m) does not feature an edge that 
consists of vertices in one color class (T~^(z) only (i = 1,2). In other words, o" 'forbids' y'^ f} '') + y f} '') 
out of the (2) possible edges. Clearly, the number of 'forbidden' edges is minimized if both color classes 
C7^^(0), cr~^(l) are the same size n/2. Furthermore, for all but a o(l)-fraction of all 2" possible a : V ^ 
{0, 1} it is indeed true that both color classes have size (1 it o(l))n/2. By the linearity of expectation. 



E[Z] 



il)-^it] 



m 



m 



2"(1 



•,l—k\m 



O 



where the last step follows from Stirling's formula. 

Our goal is to identify the regime of densities r where E [Z^] = 0(E [Z] ), i.e., where the second 
moment method 'works'. A technical issue is that Z includes 2-colorings a whose color classes have 
(very) different sizes. To simplify our calculations we ai^e going to confine ourselves to colorings a whose 
color classes (7~^(0), cr^^(l) have the same size. More precisely, let us call a : V ^ {0,1} equitable if 



|(j(0)| = |o-(l)| = ?i/2, and let Zg be the number of equitable 2-colorings of Hk{n,m). Using Stirling's 
formula and, once more, the linearity of the expectation, it is not difficult to compute E [Zg]: we have 

/n" nn 
_ (1 _ 2'-T = 0(1/ V^) • E [Z] . (9) 



Now, for what r do we have E [Zg] = 0(E [Zg] )? We use the following elementary relation. 
Fact 4.2 For any equitable a : V ^ {0, 1} we have E [Zgl = E [Zg] • E [Zg|o" is a 2-coloring] . 
Proof. As E [Z|] equals the expected number of pairs of equitable 2-colorings, we find 

E [Zg ] = y^ P [cj is a 2-coloring] • P [r is a 2-coloring | cj is a valid 2-coloring] 

= y^ P [cj is a 2-coloring] • E [Ze\a is a 2-coloring] . 






By symmetry, E [Zg|(T is a 2-coloring] is the same for all equitable a. Moreover, by the linearity of the 
expectation we have E [Zg] = ^^ P [cr is a 2-coloring]. □ 

Thus, we need to compute E [Zg|(T is a 2-coloring]. In other words, for a. fixed equitable a e {0, 1}" 
we need to study the random hypergraph Hk{n,m) given that cr is a 2-coloring. This conditional distri- 
bution can be expressed easily: just choose a set of m edges uniformly at random from all edges that are 
bichromatic under a (cf. step P2 of the 'planted model' above). Let Hk{n, m, a) denote the resulting ran- 
dom hypergraph. Furthermore, given a, let Ze{d) be the number of equitable 2-colorings r with Hamming 
distance dist((T, r) = d. Similarly, let Z{d) be the total number of 2-colorings r with dist((T, r) = d. Then 

n n 

E [Zg|a is a 2-coloring] = J^ EH,in,m,a) [Ze{d)] < ^ E^^„,„,,) [Z{d)] . (10) 

d=0 d=0 

Fact 4.3 (PI) For any < d < nwe have 



EH,in,m,a)[Z{d)] = e( Vn/(d • (n - d))) • exp(V^(d/n)), 
^Hk{n,m,a) [Ze{d)] = @{n/{d • (n - d))) • ex.p{^p{d/n)), with 

1 — X^ — (1 — x)*^"! 



V' = '4'k,r '■ (0, 1) — 5- R, X I— 7- —X ln(x) — {1 — x) ln(l — x) + r • In 



2k- 



Fact 14.31 and (ITOl) reduce the problem of computing E [Ze|cr is a 2-coloring] (and thus E [Zf])) to an 
exercise in calculus: we just need to study the function if;. 

Lemma 4.4 (Il3l) Suppose r < r first- The function if: satisfies 1/^(1/2) ~ ;i;lnE[Z], ip{l — x) = ip{x), 

V''(l/2) = 0, and iIj" {1/2) < 0. Moreover, 

1. //■^(l/2) > %l){x)forall x E (0, 1), then E [Zgjcr is a 2-coloring] < 0(E [Zg]). 

2. if there is some x G (0, 1) with 'iIj{x) > ^(1/2), then E [Zgjo" is a 2-coloring] > E [Z] • exp(ri(n)). 

Lemma |44] shows that the second moment method 'works' if and only if r is such that the function ip 
takes its global maximum at ^. Thus, let r second be the supremum of all r > with this property. Using 
basic calculus, one verifies that r second = 2^~^ In 2 — |(1 + In 2) + Ofc(l) (see ||3l Section 7]), and that for 
r > Tsecond the function 'ijj attains its maximum, strictly greater than il){l/2), in the interval (0, 2~'^' ^). In 
effect, the second part of Lemma l4.4l shows that E [Z^] > E [Zg] > exp(il(n))E [Z] for r > r second, i-e-, 
the 'vanilla' second moment ai^gument breaks beyond r second- 



4.2 Improving the second moment argument: proof of Theorem ll.il 

To improve over the naive second moment argument, we take another look at the function tp. Let a = 2^^^/^. 



Once more using basic calculus (see Section 1441) . we find 

Lemma 4.5 Suppose that r second < r < r first- 

1. We have supQ^r^^^ ip{x) > ^(1/2) > 0. 

2. For all x G (a, 1/2 - a) U (1/2 + a,l-a)we have i;{x) < -^^(1/2) < 0. 

3. In the interval [a, 1 — a] the function ip attains its unique maximum at 1/2. 

Lemma |43] allows us to deduce important information on the geometry of the set S{Hk{n, m, a)) of 
2-colorings (similar- arguments as the following have been used in HI to prove that the set of all 2-colorings 
of Hk{n,m) shatters into exponentially many well-separated pieces for a certain r). Indeed, combining 
Fact 14. 3 1 and Lemma 1431 we see that for distances an < d < {^ — a)n, the expected number of 2-colorings 
at distance d from a is exponentially small: 

EHfe(n,m,<7) [Z{d)] = exp((l + o{l))ip{d/n)n) < exp(-0(72)). 

Hence, Hk{n, m, a) does not have any 2-coloring r such that dist(cr, r) G {an, (^ — a)n) w.h.p. Similarly, 
w.h.p. there is no 2-coloring r with dht{a, r) G ((^ + a)n, (1 — a)n). Thus, w.h.p. the set of 2-colorings 
of Hk{n, m, a) decomposes into the 'local cluster' 

C{a) = {r G S{Hk{n,m,a)) : dist(cr, r) < an} 

of colorings 'close' to a, the corresponding inverse colorings {1 — r : r G C{a)}, and the remaining color- 
ings r with ^ — a < dht{a, T)/n <\ + a. 

With this picture in mind, we can interpret the maximum of ip in (0, a) as the expected size of the local 
cluster. More precisely, by Fact 14. 31 



^Hk(n,m,a)\C{(^)\ = Y] ^Hk(n,m,a) [Zd{(y)] = exp 



Q<d<an 



(l + o(l))n- sup il;{x) 



(11) 



Hence, the 'vanilla' second moment argument breaks down for r > r second because the expected size of the 
local cluster in Hk{n, m, a) exceeds the expected number E [Z] of 2-colorings in Hk{n, ni). 

Our improvement over the plain second moment argument rests on the observation that for densities 
f > r second the expected size E \C{a)\ exaggerates the typical size of the local cluster. More precisely, in 
Section |5]below we will investigate the combinatorial structure of the 'planted' formula Hk{n, m, a) closely 
to prove the following key fact. 

Proposition 4.6 Let a G {0, 1} be equitable. Ifr < 7'cond. then w.h.p. in the random formula Hk(n, m, a) 
the setC{a) = {r G S{Hkin,m,a)) : dist{a,T) < 2~^l'^n] has size \C{(t)\ < E[Ze]. 

Fix a density r < Vcond- Let us call a 2-coloring (t of a hypergraph H good if a is equitable and the its 
local cluster C[a) = {r G S{H) : dist(o-, r) < 2^^/'^n] has size \C{a)\ < E [Ze]. Furthermore, let Zg be 
the number of good 2-colorings of Hk{n, m). 

Corollary 4.7 For any r < Vcond we have E [Zg] ~ E [Ze] = G(n-^/2^E [Z]. 



Proof. Let T-L be the set of all /^-uniform hypergraphs on F = {1, . . . , n} with precisely m edges. Let Ag be 
the set of all pairs {H, a) with H ^ Ti and a G S{H) equitable. Furthermore, let A^ be the set of all pairs 
{H, a) with if G ?^ and cr a good 2-coloring of if . Then E [Ze] = K/ \'U\ and E [Zg] = Kg/ \H\. Hence, it 
suffices to show that |Ae| ~ |Ag|. But this is evident from Proposition 14.61 Indeed, Proposition 14. 61 implies 
that |{ii G ^ : (if, 0-) G Ag}| ~ |{if G ^ : (if, a) G Ae}| for any equitable a. □ 

Corollary 4.8 Suppose that r < Vcond- For any equitable a we have 

P [a is a good 2-coloring of Hk{n., m)] ~ P [a is a valid 2-coloring of Hi.{n, m)] . 

Proof. Since the total number of equitable r G {0,1} equals 2 („%) > ^'^'^ because the uniform distribution 
over hypergraphs is invariant under permutations of the vertices, we have 

E [Z„l = 2\ )P [cj is a good 2-coloring of if /fc(n,m,)l , 

\n/2J 

E[Ze] = 2 ( ^ j P [o- is a 2-coloring of Hk{n,m)]. 



Hence, the assertion follows from Corollary 14.71 □ 

We are going to compute the second moment E \ZV\ . The exact same calculation that we used to prove 
Fact |4i2] shows that E [Zf\ < C • E [Zgf if for any equitable a we have 

E [Zg\a is a good 2-coloring] < C • E [Zg] . (12) 

Thus, we are left to verify that for r < Vcond there is C = C{k, r) such that ([12]) holds. 

Let a = 2~^/^. Letting Zg{d) denote the number of good 2-colorings at Hamming distance d from a, 
we obtain 



E 



Vj Zg((i)|(7 is good 



< E[|C(a)||aisgood]<E[Ze]; (13) 

0<d<an 

the last inequality follows because if cr is good, then C{a) < E [Zg] with certainty. Further, by Corollary 14. 81 
Y^ E[Z3((i)|o-isgood] < ^ E [Ze((i)|a is good] 

an<d<n/2 an<d<n/2 

E^ . ^ , ,, , . , P [cr is a valid 2-coloring] 
E Ze(d)a is a vahd 2-coloring • — ^ ——. ^ 
^ ^ ^' ^' P crisgood 

an<d<n/2 ^ ' 

an<d<n/2 

= e(l/n) Y^ exp[i;{d/n)] ibv Fact 1431. 

an<d<n/2 

Furthermore, Lemma 1331 Lemma 1431 and Corollary I4.7l imply that 

Y E [Zg{d)\a is good] < (1 + o(l)) ^ E [Ze{d)\a is a valid 2-coloring] < C" • E [Ze] (14) 

an<d<n/2 an<d<n/2 

for a certain constant C" = C'{k,r). Since furthermore Zg{d) = Zg{n — d) due to the symmetry of the 
2-coloring problem with respect to swapping the color classes, (IT3]) and (IT4l ) yield (IT2l ) with C = 2{C' + 1). 



Hence, we have shown that E [Z^] < C-E [Zg] for all r < Vcond- Coi'ollaiy l4.7l and the Paley-Zygmund 
inequality therefore imply that 

P[Z>0]>F[Z>E[Z] /3] >F[Zg>E [Zg] /2] > 1/(4C7). (15) 

In particular, the threshold Vcoi for 2-colorability cannot be smaller than rcond, whence indeed Hk{n, m) is 
2-colorable w.h.p. for any r < Vcond- The second claim ^ follows from (IT5] ) together with Lemma [34l 

4.3 Beyond the condensation transition: proof of Theorem [13] 

The goal in this section is establish Theorem II .31 i.e., to prove that there is a non-empty regime of den- 
sities rcond < r < Tcoi in which Hk{n,m) is 2-colorable but InZ < lnE[Z] — Q,{n) w.h.p. To get an 
intuition why this should be the case, consider a density r > rcond + Efc- In Proposition l4.9l below we will 
see that w.h.p. (for suitable e^), the size \C{a)\ of the 'local cluster' in the planted model Hk{n,m,a) is 
bigger by an exponential factor exp(r2(n)) than the expected number E [Z] of 2-colorings of Hk{n,m). 
However, if it was true that In Z ~ In EZ, then Proposition 11.51 would imply that the planted model and 
the Gibbs distribution (first choose Hk{n,m) and then choose a G S{Hk{n,m)) randomly) are 'close'. 
In particular, in a random pair {H,a) chosen from the Gibbs distribution the local cluster C{a) should 
have size > E [Z] exp(f](n)). This would lead to the absurd conclusion that under the Gibbs distribution 
Z > |C((t)| > E [Z] exp(0(n)) w.h.p. (in obvious contradiction to Markov's inequality). Hence, intuitively 
the condensation transition occurs because the size of the local cluster in the planted model Hf^{n, m, a) sur- 
passes the expected number E [Z] of 2-colorings of Hk{n, m). Indeed, it is not difficult to turn this intuition 
into a proof of part 2 of Theorem [O] (see the proof of Theorem ll.3l below). But the above still allows for the 
possibility that the condensation phase may just be empty, i.e., that the typical size of the local cluster in the 
planted model Hk{n, m, a) is bounded by E [Z] for the entire regime of r where H^^n, m) is 2-colorable 
w.h.p. The purpose of this section is to show that that is not so. 

To prove this, we are going to show that w.h.p. Hk{n, m) has a 2-coloring a whose local cluster C((t) is 
smaller than E [Z], i.e., much smaller than the local cluster in the planted model. As we will see in Section|5] 
below, the size of the local cluster of a 2-coloring a is governed by the edges that contain precisely one 
vertex v with color i and k — 1 vertices with color 1 — i (with either i = or i = 1). Let us call such edges 
critical under a. Intuitively, critical edges 'freeze' v by preventing v from switching to the opposite color 
1 — i, thereby reducing the entropy of the local cluster. 

Given this intuition, it seems natural to assume that 2-colorings that have a particularly high number 
of critical edges should have rather small local clusters. Thus, we say that a 2-coloring a of Hk{n, m) is 
(1 + f3)-critical if a is equitable and the total number of critical edges equals (1 + j5)km/{2^~^ — 1). Let 
Zij^p be the number of (1 + /3)-critical 2-colorings. Furthermore, let us call a (1 + /3)-critical 2-coloring a 
good if indeed the local cluster 

C{a) = {r G S{H) : dist(a,r) < 2-'=/2| 

satisfies C{(t) < E [Zi^^^,], and let Z^ i+^ be the number of good (1 + /3)-critical 2-colorings. 

Proposition 4.9 For any k > k^ there exist a density Vcrit > fcond, ^k > 0, and (3^ > such that for 
r = rcond the following three statements hold. 

1. We have E [Z^^i+^^J ~ E [Zi+^^J = exp(r2(n)). 

2. Let a G {0, 1} be equitable and let H = Hk{n,m,a) be a hypergraph chosen from the planted 
model. Then w.h.p. the local cluster C{a) = {r G S{H) : dist{a,T) < 2~ '^} has size |C(cr)| > 
E [Z] • exp(4n). 



We defer the proof of Proposition |4.9| to Section [S] 

In the sequel, we fix /c > /cq big enough and let r = rent and /3 = /3fc be as in Proposition 14.91 In the 
rest of this section, we are going to carry out a second moment argument for ^g,i+/3 to show the following. 

Proposition 4.10 With r, j3 as above, we have 

for some constant C = C{k) > 0. 

As before, this amounts to showing that 

E [Zg^i+js\a is a good (1 + /3)-critical 2-coloring] < C • E [Zg i+^] , (16) 

for some number C = C{k, r) > 0. To establish (IT6l ). we let Z^ i+/3(d) signify the number of good (1 + /3)- 
critical 2-colorings at Hamming distance d from a. Let 7 = 2~*^/^. The very definition of 'good' ensures 
that 

^ E [Zg^i+p{d)\a is a good (1 + /3)-critical 2-coloring] < E [^9,1+/?] • (17) 

d<7n 

The following bound covers 'intermediate' distances. 
Lemma 4.11 We have 

y ^ E [Zg^i+/3(d)|(7 is a good (1 + (3)-critical 2-coloring] = o(l). 

7n<d<(l/2-7)n 

Proof. Let jii < d < (1/2 — 7)71. Let r be equitable and at distance d from a. Let us briefly say r is vaZ/cf 
if r is a 2-coloring of Hk{n, m). Then 

P [r is validjo" is a good (1 + /3)-critical 2-coloring] 

P [r is valid, a is a good (1 + /3)-critical 2-coloring] 



< 



P [(T is a good (1 + /3)-critical 2-coloring] 
P [a, T are valid] 
P [o" is a good (1 + /3)-critical 2-coloring] 

P [a is valid] 



P [r is valid I cr is valid] 



P [(T is a good (1 + /3)-critical 2-coloring] 



F \7] 
= P [r is vahd|(7 is vahd] • — y— r < P [r is valid | a is valid] • E [Z] , 

E [Zg^l+I3\ 

because E [Z^ 1+^] > 1 by our choice of /3. Hence, Lemma 1431 yields 

InE [Zg i+^(d)|cj is a good (1 + /?)-critical 2-coloring] 

< lnE[Z]+lnE[Ze(d)] 

< -^^(1/2) + V((i/n) + 0(1) < 0, 

as 7 < d/n < 1/2 — 7. Summing over d yields the assertion. □ 

Thus, we are left to estimate the contribution of distances (1/2 — 7)71 < d < n/2. We need to 
characterize the conditional distribution of Hk{n,m) given that some equitable cr is a (1 + /?)-critical 2- 
coloring. But since Hk{n, m) is just a uniformly random hypergraph with m edges, this is straightforward: 
let Hk{n, 7TT,i, 7712, c) denote the random hypergraph generated as follows: 



• Choose a set Ei of nii edges that are critical with respect to a uniformly at random. 

• Choose a set E2 of 1112 edges that are bichromatic under a but not critical uniformly at random. 

• Let Hk{n, mi, 1712, a) = {V, El L) E2). 

Then for mi = (1 + f3)km./{2^~^ — 1) and for m,2 = m — m,i, the conditional distribution of Hk{n, m) 
given that u is (1 + /3)-critical is precisely Hk{n, mi, 7712, o"). 

To estimate 'E'Hi^{n,mi,m2,a) [^9,i+i3{d)], we need to study the conditional probabihty that a certain eq- 
uitable r at distance d from o" is (1 + /3)-critical. 

Lemma 4.12 Let r be equitable at distance d = an from a. Then ^ Hk{n,m-i,m2,cj) {'^ '^ 1 + /^-critical] ~ 
£{a), where 

E{a) = (l-wi)™i(l-W2)™'P[Bin(mi,Mi/(l-wi))+Bin(m2,U2/(l-W2)) =TOi] with 
ui = {l-af + a^ + {k-l)a^{l-af-^ + {k-l)a^-'^{l-af, 
vi = a{l - af-^ + {I - a)a^-^ , 

fc (1 - a'^ - (1 - af - a^-\l -a)- a(l - af-^ - {k - l)a'=-2(l - a)2 - (fc - l)a^^{l - a)'^-^) 



U2 



V2 = 



2fc-i -k-1 
1 - 2 [a'^ + (1 - «)'= + 2ka{l - a)''-^ + 2ka''-^{l ~ a)] 
2^ - 2fc - 2 



Proof. By enumerating all possibilities, we see that the probabihty that a given edge that is critical under a 
also is critical under r equals ui (either all its vertices have the same color under both a and r, or they have 
opposite colors under r, or the colors of the supporting vertex and exactly one other vertex differ, or the 
supporting vertex has the same color and the colors of exactly k — 2 others differ). Similarly, the probability 
that an edge that is critical under a is monochromatic under r works out to be vi . 

Now, take a random edge that has 2 < I < k — 2 vertices of color 1 under a. The probability the edge 
is monochromatic under r equals 

a'(l-a)'=-' + (l-a)'a'=-'. 

Convoluting this formula with the distribution of the number of edges with a given number of vertices of 
color one under a, we obtain 

1=2 

This is the probability that a random edge that is neither critical nor monochromatic under a is monochro- 
matic under r. 

Furthermore, the probability that a random edge that has precisely / vertices of color 1 is critical under 
r equals 

y-\l - a)'=-'+^ + /(I - ay-^a'''^+^ + {k - l){l - aY+^a''-^-^ + {k - l)a^+\l - a)''-^-\ 

Convoluting this formula with the distribution of the number of edges with a given number of vertices of 
color one under a, we get 



fe-2 
1=2 



k\ la^-\l - a)fe-'+i + 1{1 - a)'-iafe-'+i + (fc - /)(! - aY+^a''-^-^ + (fc ^ l)a^+^l - a)^-'-^ 
I I 2'^- - 2fc - 2 ■ 



This is the probabihty that a random edge that is neither critical nor monochromatic under a is critical under 
r. The conditional probability of a random edge being bichromatic and critical resp. not critical under a is 
thus 

Ul U2 

resp. . 

1 — t;i 1 — 1^2 

Since the m edges are drawn independently up to the trivial dependence that no edge is drawn twice, we 
thus see that the probability that r is 1 + /3-critical is (1 + o{l))£{a). 



a 



Corollary 4.13 For any < d < nwe have 
1 



n 



^^-^^fe(",mi,m2,(T) [Zg,l+I3{d)] 



g{a) 



g{d/n), 



with 



1 



h{a) -\ — ln£{a), where h{x) = — xln(3;) — (1 — x) ln(l — x). 



n 



Proof. This simply follows from Lemma 14.121 and the fact that the number of r at distance d from a is (^) 

and ^ In (^) ~ h{d/n) by Stirling. D 

Lemma 4.14 The function gfrom Corollarv \4. 13\ takes its unique maximum in the interval (1/2— 7, 1/2+7) 

at 1/2, and g" {1/2) < 0. 

Proof. Let r be equitable and at distance an from a. Moreover, let Xi (a) be the number of edges of 
Hk{n, mi, 7712, o") that are critical under both a, r. In addition, let X2(a) be the number of edges that are 
critical under r but not under a. As //^(n, rrii, 7712, cr) consists of two independent 'portions' of random 
edges, namely rrii that are critical under a and another 1112 that are not, Xi{a),X2{a) are independent. 
Furthermore, 

Xi{a) ~ Bin(mi, qi{a)), X2{a) ~ Bin(?Ti — mi, 92(0)), 

where qi{a) = -^(a), 92(0) = TZT-{a). Let 



p{a) = P [Xi{a) + X2ia) = mi] 



p{a, xi,X2) = P [Xi{a) = xi A X2{a) 



X2 



SO that 



P(a) = XI p{a, xi,X2). 

xi,X2:xi+X2=mi 

Let us first investigate the point a = 1/2. As qi{l/2) = (72(1/2), we have 

(Xi+X2)(l/2)~Bin(mi,g). 



with 



q = qiil/2)=q2il/2) 



2k 



k 



2k 



2k- 



Using Lemma [3^ we can compute p(l/2) directly: 



P(l/2) 



m 



^mi, 



1 



mi 
@{n-^/^) • exp 



\m—mi 



mi — mq \ ,^ ^ I mq — m,i 

-m,q ■ (p \ ) — (1 — q)m, ■ if 



mq 



m(l — q) 



where (^(x) = {1 + x) ln(l + x) — x. Hence, 



p(l/2) 



e(n 



-1/2N . 



exp 



m[q-if{P) 



)^ 



To proceed, we need to decompose this expression according to the individual contributions of Xi(l/2), X2 (1/2). 
Let xi, X2 be such that xi + 2:2 = rrii. Since Xi(l/2), X2(l/2) are independent, we have 

p{l/2,Xi,X2) = P[Xi=XiAX2 = X2] 

= P [Bin{mi,q) = xi] • P [B'm{m2,q) = X2] 



@{n ^)exp 



-qniiip 



xi — mic 



•exp 



-qm2(p 



rriiq 
X2 - m2q 
m2q 



fmiq-xi 

{l-q)mi(p[ ^ 

V(l -q)mi 



(1 - q)m2ip 



m2q - X2 
(1 - q)7n2 



and 

Similarly, for general a we have 

p(a, xi,X2) = @{n~ ) exp 

• exp 
and 



pil/2)= Yl ^(lAxi,X2). 



xi+X2=nii 



,xi-miqi{a)\ ( miqi{a) - xi 

-qi{a)mi(f -— — - (1 - qi{a))mnp 

miqi{a) 



-q2{a)m2ip 1—^ — - (1 - Q2{a))m2ip 



m2q2{a) 
P(«) = X] P{a, xi,X2). 



(1 -qi{a))mi 

m2q2{a) - X2 
(1 - q2{a))m2 



xi+X2=m.i 

As Ui{l - a) = Ui{a) for a G (0, 1), we have n-(l/2) = w-(l/2) = for i = 1,2. Moreover, a direct 
calculation shows that 

<(l/2) = 0(A;V2'), 
^(1/2) = Oik^A"). 



Hence, by the chain rule. 



whence 



(lnp(a,xi,X2)) 



In 



p{l/2- 6,xi,X2) 



3 /nk\ 



a=l/2 



0{ky2 



6'-0{ky2'') + 0{6-'). 



p{l/2,Xi,X2) 

Furthermore, as <(l/2) = 0{k^/2''), V2 = 0(A;^/#), we also obtam 

(1 - i;i(l/2 - 5))"^i (1 - f;2(l/2 - 6)^'^ 



in- 

(1 - t;i(l/2 - (5))™i(l - ^2(1/2 - <5))™2 

As the derivatives of the entropy ai^e 

h'{a) 
h"{a) 

(HUl and Gil yield 



6^ ■ 0{ky2^) + 0{6' 



(18) 



(19) 



Ina + ln(l — a) 
1 1 



a 1 — a 



,3 /r,k\ 



<S(l/2 -5) = -A6' + 6' ■ 0{ky2^) + 0(5'^). 



(20) 



Finally, (l20l) shows that g takes its unique maximum in (1/2 — 7, 1/2 + 7) at 1/2, and g"{l/2) < 0. □ 



Combining (ITtI ). Lemma |4.11[ and Lemma 14.141 we obtain (IT6l ). This completes the proof of Proposi- 
tion gJO] 

Proof of Theorem [73] Let r = Vcrit, 5 = 5^ , /? = /3fc be as in Proposition 14.91 Then by Proposition l4.10[ the 
probability that Hk{n, m) is 2-colorable is bounded from below by a positive constant. As 2-colorability in 
Hk{n, m) has a sharp threshold, this implies that TcoI > Tcrit- This proves the first assertion of Theorem 1 1.3 1 

To prove the second assertion, fix r = rcrit- Then the second part of Proposition |4]9] implies that w.h.p. 



1 1 

- In Z{Hk{n,m, a)) > -lnE[Z]+6n. 
n n 



(21) 



However, there is no obvious way to derive the second assertion in Theorem 11.31 directly from (|2TI ). because 
it is not clear a priori that the random variable - In Z{Hk{n, m, a)) is tightly concentrated. Therefore, we 
will replace it by another random variable for which concentration is easy to show. Namely, for any 6 > 
we let 

Zb= ^ exp(-5w(o-)), 

aG{0,l}" 

where 'w{a) is the number of monochromatic edges under a. (The above random variable is called the 
partition function at inverse temperature b.) The random variable - In Z^ satisfies a Lipschitz condition: 
either adding or removing a single edge can change the value of - In Z^ by at most b. Therefore, Azuma's 
inequality implies that in both the random hypergraph Hk{n, m) and in the planted model Hk{n, m, a) we 
have 

2m 



P [\ln Zb-E In Zb\ > y] < 2exp 



(22) 



We are going to derive an upper bound on E In Zb{Hk{n, ni)). To this end, let 5^ denote the number of 
a G {0, l}'^ with w{a) = /.t in Hk{n, m). Then 



Zb = ^exp(-5/i) • |5p 

At=0 



Furthermore, letting fi = 7m for some small 7 > and using Lemma [X2l we obtain 

Bin(m,2^-*^) = /i 
n 

r 
In 2 



-lnE5„ ~ ln2 + -lnP 
n n 



r 



^(1 _ 2^--^7) + (2 



ife-i 



1)V9 



ifc-l 



7-1 



>fc-i 



where '^{x) = (1 + x) ln(l + x) — x. Plugging (l24l) into ( |23l ). we see that 

-Eln Zb < -lnEZ(Hk(n,m)) + Eb, 
n n 



(23) 



(24) 



(25) 



where e^ — ^ as 6 — )• 00. Intuitively, this miiTors the fact that the partition function is dominated by 
assignments that violate an Ob(l)-fraction of all clauses as 6 — )• 00. From now on, fix b lai^ge enough so that 
Eb < 6/4. Thus, (I22]l and ^ imply that 



H^ {n,m) 



11 A' 

- In^b > - lnE[Z{Hk{n, m))] + - 
n n 6 



oil). 



(26) 



We will contrast (1261 ) with the situation in the planted model. Let ^ > be sufficiently small and let 
r G {0, 1}" be such that ||r^^(0)| — |r~^(l)|| < ^n. Then there is an equitable a such that dist(cr, r) < ^n. 



In Hk{n, m, a) the number of edges that ai^e monochromatic under r has a binomial distribution with mean 
< k^m. Therefore, we obtain 

— Eln2b(iJfc(n, m,T)) > — I] In Zff(Hf:{n,m, a)) — km^ — o(l). 
n n 



Combining this with (1211) and choosing ^ > sufficiently small, we thus get 

-ElnZh(Hk(n,7n,T)) > -lnEZ(Hk(n,m)) + 26/3. 
n n 



Hence, (122] ) yields that 
^1 



\nZb(Hk(n,m,T)) < -lnEZ(Hk(n,m)) + 6/2 
n n 



< exp(— ^"n) 



(27) 



with ^" > 0. 

To complete the proof, consider the set A of all pairs {H,t) of hypergraphs H onV = {1, . . . ,n} with 
m edges and 2-colorings r. Let 



A' 



(H,t) G A: -ln^b(m < -lnEZ(Hk(n,m)) + 6/2} , 
n n ' 



A" = {iH,T)eA:\\T^\l)\-\T~\0)\\>Cn} 
Clearly, ^ shows that 



|A'\A"| < exp(-e"n) |A| . 

Furthermore, since the number of hypergraphs H for which r G {0, 1}" is a 2-coloring is maximized for 
equitable r, we have | A"| < exp(— ^'"n) |A| , with ^"' > sufficiently small. Hence, 

|A'| < 2exp(-e"'n)|A|. 

Now, suppose that ai,a2 are such that 



(28) 



— In Z{Hk{n,m)) > — lnEZ(fffc(n, ?7i)) — ai?i 
n n 



> a2- 



(29) 



Since Hk{n, m) is uniformly distributed over all {^^^) hypergraphs with m edges, we obtain 

lA'l >a2l^''^]E[Z(Hk(n,m))]exp(-ain) > a2exp(-ain) • lAI . 
' ' \m J 

Setting ai = ^"'/2 and comparing (l28l) with ( |29l ). we see that necessarily a2 = o(l). This proves ^ in the 
case that r = rcrit- 

Finally, consider any density rcrit < r < r^oi- We generate the random hypergraph Hk{n,m) in two 
'portions' Hi and H2. Namely, letting mi = Tcond'n and m2 = (r — rcond)n, we let Hi = Hi^{n,mi). 
Then H2 is simply obtained by adding another 7712 random edges to Hi. By the above, we know that w.h.p. 

Z{Hi) < E [Z{Hi)] • exp(-0(?i)). 

Furthermore, a new random edge is bichromatic under a 2-coloring of Hi with probability 1 — 2^"^', we 
have 

E {Z{H2)\Hi] < Z{Hi) ■ (1 - 2'-^)'^\ 

Thus, w.h.p. 

Z{Hu{n,m)) = Z{H2) < E [Z(Fi)] • exp(-0(?i))(l - 2i-*^)'"2 = E [Z(Ffc(n,m))] exp(-fi(n)), 
as claimed. □ 



4.4 Proof of Lemma |45] 

Since ip{l — x) = ip{x), we only need to work with x < 1/2. Let r = {2^~^ — l)(c/2'^ + In 2) for \c\ < 4. 
Let h{x) = — X In x — (1 — x) ln(l — x). Then 

ij{x) < h{x) - ._\ ^ (1 - x'^ - (1 - x)^) < h{x) - {c/2^ + ln2)(l - x^ - {I - x)^). 

Suppose that x > 2'^/'^ but x < 1/(1. Olfc). Then 

^(x) < ^(l-lnx)- (c/2'=+hi2) ('l-(l-x)'=) +2"'= 

< x(l - In x) - {c/2^ + hi 2) (fex - (fcx)2) + 2"'= 

< x[l-lnx- A:(l- A:3;)ln2] + 2^~'' < -2^"'' < -^(1/2), (30) 

provided that fe > feo is large enough. Furthermore, for 1/(1. OlA;) < x < 0.49 we have 

V'(x) < /i(x) - (c/2'= + In 2) (l - (1 - x)'=) + 2-^ 

< h{x) - (c/2'= +In2)(exp(-A;x) - 1) + 2"'' 

< /i(x)-(exp(-A;x)-l)ln2 + 23~^ < -2^"'' < -^(1/2), (31) 

again for k> k^ large enough. 

Finally, around x = 1/2 we can expand ijj as follows. Since ^'(1 — x) = ipi^)^ it is clear that ^'(1/2) = 
0. Furthermore, ^"(1/2) = -4 + Ofc(l), and ^'(1/2) < h"'{l/2) + Ofc(l) = Ofc(l). Therefore, for A; > /cq 
large enough we can expand ip around 1/2 as 

^{\-^)= V'll/S) - (4 + Ok{l))6^ + 0(5''). (32) 

Thus, the lemma follows from ([30l)-(l32l). 



5 The local cluster: proof of Propositions 14.61 and 14^ 

5.1 Outline 

In this section we prove Propositions I4.6l and |4.9l Fix an equitable 2-coloring cr : F — )• {0, 1} and recall that 
an edge e of a hypergraph H is critical under a if there is a color i G {0,1} and a vertex v ^ E such that 
a{v) = i and a{w) = 1 — i for all tf G e \ {v}. In this case, we say that v supports the edge e (under a). 
We are going to study the size of the local cluster in the Hk{n, mi,m2,a) model from Section |431 

• Choose a set Ei of rrii edges that are critical with respect to a uniformly at random. 

• Choose a set E2 of ?n,2 edges that are bichromatic under a but not critical uniformly at random. 

• Let H}:{n, jni, 7712, a) = {V, Ei U £^2)- 

We are going to expose the edges of Hk{n, mi, ?n,2, o") in two portions: let Hi contain the mi critical edges, 
and let H2 contain the rest. Let A = m,i/n be the expected number of edges that any one vertex supports. 
We will need the following simple expansion property of the random hypergraph Hi. 

Lemma 5.1 Let (^ < 1/3. W.h.p. Hi has the following property. Suppose that S C V has size \S\ = Qn. 
Then w.h.p. the total number of edges supported by vertices in S is bounded by C,{e^\ — In C,)n. 



Proof. We use a first moment argument. Let ^ = e^A — In C and /i = $^C,. The probability that there is a set 
S of size Qn that supports a total of /xn edges is bounded by 



n 



nil 
C,nJ \fin 



Afin ^ 



< 



e 

c 

e /eA 

CU 



V 1 




n 


)1 








Cn 



by our choice of ^ and because (^ < 1/3. 

Lemma 5.2 Le? / > be fixed. W.h.p. the number of vertices that support precisely I edges is 

A' 

(l + o(l))n 



D 



l\ exp(A) 



Proof. The number of edges that any one vertex supports is binomial with mean A. Hence, the Poisson 
approximation to the binomial distribution shows that the probability that some vertex v supports precisely 



I edges is (1 + o(l)) 



A' 



«!cxp(A) 



In effect, letting Xi be the number of vertices with this property, we see that 



EX; = (l + o(l))n 



/!exp(A)' 



Furthermore, Xi satisfies a Lipschitz condition: adding or removing a single edge can change the value of 
Xi by at most one. Therefore, Azuma's inequality shows that Xi = {1 + o(l))n • ^,^^ z^^. w.h.p. □ 

In particular. Lemma [5^ shows that the total number of vertices that do not support any edges is (1 + 
o(l)) exp(— A)n w.h.p. Now, consider the following construction of a set [/ C V. 

1. Initially, let U consist of all vertices that do not support any edges. 

2. While there is a vertex v ^ U that does not support an edge that does not contain a vertex from U, 
add V to U. 

The above is an adaptation of the 'whitening process' from |[6| to random hypergraph 2-coloring. 
Let Hjj be the hypergraph with vertex set U and edge set 

{en[/:eG^(ifi),|enC/| > 2} . 

In general, this is going to be a non-uniform hypergraph. 

Proposition 5.3 W.h.p. the set U has size 

\U\ = n [exp(-A) + \{k - 1) exp(-2A) + 0(7.1"^') 

and enjoys the following properties. 

Ul. The set Sq C U of variables that do not support a clause has size has size \So\ = (1 + o(l))n exp(— A). 

U2. There is a set Si of size 

(1 + o(l))n \\{k - 1) exp(-2A) + 0(7^^) 

such that all vertices in Si support exactly one edge that contains precisely one other vertex from U, 
which indeed belongs to Sq. 



U3. Apart from the edges resulting from U2, Hu contains no more than nO{7.1 ^) further edges. 



We defer the proof of Proposition [53] to Section [ 

We say that R dV is rigid if for any 2-coloring r of Hi such that t{v) / cj{v) we have 

\{veR:T{v) ^C7{v)]\ >n/k^. 

In Section [53] we will prove the following. 

Proposition 5.4 W.h.p. there is a rigid set R C V \U of size \R\ r^ \V \U\. 

We now have sufficient information about the random hypergraph Hk{n,mi,m2,cy) = Hi U H2 to 
prove Propositions 14.61 and 14.91 

Proof of Proposition \4.6\ Proposition 14.61 deals with the random hypergraph Hkin,m,a), in which the 
number of critical edges has abinomial distribution Bin(m, /c/(2'^^^ — 1)). Hence, by Chernoff bounds the 
number of critical edges is (1 + o{l))7nkr / {2^^^^ — 1) = (1 + o(l))A?i w.h.p., with A = kr/{2^~^ — 1). 
Thus, to study Hk{n, to,, a) it suffices to investigate Hk{n, mi, 1712,0-) with mi ~ An and ?n,2 ~ to — An. 

To prove Proposition 14. 61 we merely need to derive an upper hound on the size \C{a)\ of the local cluster. 
Thus, it suffices to bound the size of the local cluster 

C^[a) = ^T : dist(o-,r) < 2~^/'^n, r is a 2-coloring of i^ij 



of Hi. By Proposition 15.41 we have w.h.p. 

-ln|C((T)| < -ln\C\a)\ = -lnZ{Hu). 
n n ' ' n 

Hence, we just need to bound the number Z{Hu) of 2-colorings of Hjj. By Proposition [53] we may assume 
that Hjj has the properties U1-U3. 

If this is indeed the case, we can estimate In Zjj as follows. Let H'jj be the hypergraph obtained from 
Hu by omitting all edges that are incident with a vertex from U \{SqVJ Si). Then each edge of H'jj has 
size 2 and contains precisely one vertex from 5*1 and one vertex from 5o. Moreover, each vertex from Si is 
incident with exactly one such edge, and indeed supports this edge under a. Hence, H'jj is just a collection 
of stars in which all non-isolated vertices in ^i are leaves, and therefore the total number of 2-colorings of 
H'jj is simply equal to 2l'^"L Thus, 

-\n\C{(j)\ < -\nZ{Hu) < -\nZ{H'rj) < ^ln2< (1 + o(l)) exp(-A) ln2. 
n n n n 

A straightforward computation shows that this is indeed less than ^ InE \Ze{Hk{n,m,))] if r < (2'"'"^ — 
l)ln2. □ 

Proof of Proposition \4.9\ We start by obtaining an upper bound on the size of C{a). Let A = (1 + 
/3)kr / {2'^~^ — 1) for some /3 < 1/k. We first study the size of the local cluster C^{a) in Hi. By the 
same argument as in the proof of Proposition 14. 61 above, w.h.p. we have 

-\n\C\a)\ < -lnZ(H'rj) < (1 + o(l)) exp(-A) ln2, 
n ' ' n 

where H^ is a collection of stars as above. While cleai^ly ^ In \C{a)\ < ;^ In |C^(cr)|, we need a slightly 
tighter estimate of |C(o-)|. 

To obtain this estimate, we need to take the edges of H2 into consideration. Let £'2 consist of all edges 
e G i/2 that contain precisely two vertices from Sq \ N{Si) and in which all vertices inV\U have the 



same color under a. Since H2 is independent of Hi, the number of these edges is binomially distributed 
with mean 



\So\NiSir (2) ^ A 



„2 2^-1-1 ^•"^2>n(^2j«^P(-2^)l'^2 = ^2. 

By Chernoff bounds, we indeed have jE'gl ^ (1 ~ c»(l))^2 w.h.p. Furthermore, the expected number of 
vertices in 5o that ai^e incident with two edges from E2 is < 0(A:^exp(— 3A)); as this number satisfies a 
Lipschitz condition, it is concentrated by Azuma's inequality. Hence, w.h.p. E2 contains a subset £"2 of size 



<A /ok\ 



\E2\/n> ( Jexp(-2A)ln2-0(A;^ 
such that E2 induces a matching in 5o. By construction, this matching is disjoint from H'jj. Hence, w.h.p. 

In2 + 0(A;V8''). (33) 



-ln|C((j)| < -ln\C\a)\ - IS''! In2 < exp(-A) 
n n ' ' 



2)exp(-A)ln2 



To derive a matching lower bound, notice that Proposition [53] implies that all but 0{7.1^^)7i edges of 
Hi belong to the matching H^ w.h.p. Let Fi be the set of all vertices that are reachable from the edges in 
Hi \ H'jj. Then |Fi| < 4|i/i \ H'jj\ < 0(7.1~'^)n w.h.p. While we cannot say much about the entropy of 
the vertices in Fi, it is clear that Hu — Fi is just a matching from Si\ F to So\ F. Therefore, w.h.p. 

-ln\CHa)\ > |5o\F|ln2> (exp(-A) - 0(7.1"^')) ln2. 

n ' ' 

Let E'^ be the set of all edges e ^ H2 that contain at least three vertices from U such that all vertices me\U 
have the same color under a. Then E\E'^\ < 0{k^ /2^) ■ {\U\/nfm < 0{k^/8'')n. Let F3 be the set of 
all vertices in Hu that are reachable from {t; G [/ : 3e G E'3 : w G e}. Since \E'^\ is binomially distributed, 
we have IF3I < 0(/c^/8'^)n w.h.p. Furthermore, let E2 be as above. Let Fg be the set of all vertices in 
A^(5i)U[/\(S'oUFiUF3) that are incident with an edge of i?2- Then IF2I is binomially distributed with mean 
< 0(/cV2'')exp(-A)|C/\5o|m < 0(/cV8'')n (by Proposition[53]l, and thus w.h.p. \F^\ < 0{k^ /S^)nhy 
Chernoff bounds. In addition, let F2 be the set of all vertices in Sq that are incident with at least two edges 
from E'2. As we saw above, F2 < 0{k^/8^)n w.h.p. Let F2 be the set of all vertices in H'jj that are reachable 
from F^UF^'; since H'^j is a matching, we have IF2I < 2\F^UF^'\ = 0{k^/8^)n w.h.p. Finally, let Ei^ be the 
set of all edges in E'2 that do not contain a vertex from F2. Then 1^2 1/?^ < (2) exp(-2A) In 2 - 0{k'^/S^) 
w.h.p. 

Now, E'2 and H'jj simply induce a matching on (5o U 5i) \ (Fi U F2 U F3), and this matching is 
disconnected from all other edges of Hi U H2 that are not already 2-colored given the colors assigned to the 
vertices mV \U. Hence, w.h.p. the number of 2-colorings is at least 

-ln|C(a)| > ilnlCHcr)! - |Fi'|ln2-|FiUF2UF2|ln2 



> exp(— A) 



1- (^)exp(-A)ln2 



ln2- 0(7.1-^'). (34) 



To prove the first claim, we need to combine (1331) and (I34b with a lower bound on the expected number 
of (1 + /3)-critical 2-colorings. Let q = k/{2^~'^ — 1). The probability r?i+;3 that an equitable cr is a 
(1 + /3)-critical 2-coloring of Hk{n, m) satisfies 



In r]i+p ~ m ln(l - 2^"'') + In P [Bin(m, q) = {I + (3)qm] 



Indeed, the first summand accounts for the probabihty that cr is a 2-coloring, and the second summand is 
the probability that given that a is a 2-coloring, the number of critical edges equals (1 + fi)qm. By the 
Lemma [3^ for sufficiently small /3 > we have 



Hence, 



ilnP[Bin(m,g) = (1 + /3)gml > -LS^ > -kji"^ . 
n n 



i In E \Zi<r\ > i In EZ - kji"^ . 
n n 



If r = 2 In 2 — c, then a direct computation shows that 



Consequently, 



ilnE[Z]=ln2 + rln(l-2i-'=) > i?^-i^ - 0(4^^). 



llnE[Zi+,]>i^^^-0(4-^-)-fc/32. 



Choose c (and thus r) such that with Aq = kr/{2^ ^ — 1) we have 



H = exp(-Ao) 



exp(— Ao) In 2 



In 2 - 7' 



lnEZ + 16^ 



n 



(35) 



(36) 



(37) 



A straight computation using (l35l) shows that c = ln2 + Ofc(l). Furthermore, (l34l ) and (l37l) show that for 
this r w.h.p. in the planted model Hf:{n, m, a) the local cluster C{a) has size \C{a)\ > exp(J7(n))EZ. Let 



exp(-(l + /3)Ao)ln2 



In 2. 



/(/3) = exp(-(l + /3)Ao) 

Expanding /(•) around /3 = 0, we find that 

/(/?) - /(O) = -/3(exp(- Ao) In 2 + OikH~'')) + 0(/32)/2^ 
Hence, (l36l ) implies that for /3* = 3~^ we get 

/(/3*) + 7-fe<ilnE[Zi+^.]. 
n 

Further, ( [33l ) implies that with mi = (1 + /3*)Aon, 7712 = m — rrii in Hk{n, mi, 1712, a) w.h.p. the local 
cluster size satisfies - In |C((7)| < ^ InE [Zi_|_^.]. This means that r, /?* as above satisfy the conditions in 
Proposition |4]9l □ 



5 .2 Proof of Proposition 15.31 

Let Uq be the set of all vertices that do not support any edge. Then w.h.p. |C/o| ~ n exp(— A) by Lemma [5^ 
For each vertex v let s{v) be the number of edges that v supports. Let Ui be the set of all vertices v with 
s{v) > 1 such that all edges supported by v contain a vertex from Uq. 

Lemma 5.5 W.h.p. we have the following. 

1. The number of vertices v with s{v) = 1 such that the edge e supported by v contains exactly one 
vertex from Uq is 



n 



A(A;-l)exp(-2A) + 0(7.3- 



2. The number of vertices v with s{v) = 1 such that the edge e supported by v contains more than one 
vertex from {Uq U Ui) \ {v} is n ■ 0{7.3~''). 

3. The number of vertices v with s{v) > 1 such that all edges e supported by v contain a vertex from Uq 
is bounded by n • 0(7.3~'^). 



Proof. Let X be the number of vertices as in 1. By Lemma OM the number of vertices v with s{v) = 1 
is (1 + o(l))Aexp(— A)n w.h.p. Furthermore, given that v satisfies s{v) = 1, the k — 1 other vertices in 
the unique edge e that v supports are uniformly distributed over the opposite color class. Hence, again by 
Lemma ls!2l the number of non-supporting vertices amongst these k — 1 vertices has a binomial distribution 
Bin(fc— 1, (l + o(l)) exp(— A)) w.h.p. In this case, the probability that exactly one of the k — \ other vertices 
is non-supporting is {k — 1) exp(— A) + 0(exp(— 2A)). Hence, we see that 

EX = (1 + o(l))A exp(-A) • [{k - 1) exp(-A) + 0(exp(-2A))] = n \\{k - 1) exp(-2A) + 0(7.9"^ 

Furthermore, X satisfies X = EX + o{n) w.h.p.; for the number of vertices v with s{v) = 1 is concentrated 
by Lemma [5^ In addition, for all such v with a{v) = the events that the edge e^ supported by v contains 
a non-supporting vertex are mutually independent. Hence, this number has a binomial distribution and is 
therefore concentrated by Chernoff bounds (Lemma lXTI ). As the same is true of the vertices v with a{v) = 1, 
X is concentrated about its expectation. The other two claims follow from a similar argument. □ 

Then the above lemma shows that \Ui\ jn < X{k — l)exp(— 2A) + 0{7~'') w.h.p. Furthermore, the 
hypergraph Hjj^yji;^ mostly consists of isolated vertices and edges of size 2 (and only very larger edges). 

We now need to analyze how the process for the construction of the set U proceeds. All vertices in 
V \{UqVJUi) support at least one edge that does not contain a vertex from Uq. We will now construct sets 
Uj, j > 2, inductively as follows: 

let Uj be the set of all vertices v £ V \ Uj<7 ^« ^^^^ ^^^^ ^^^ edges supported by v contain a 
vertex from Uj<i-i ^«- 

LetU* = [jj>2Uj. 

Lemma 5.6 W.h.p. Hi has the following property. Let T be a set of size < n/2^~'^. Then the number 
T of critical edges that are supported by a vertex v ^ T but that contain a vertex from T is bounded by 
36A;3 2-^=71. 



Proof. We use a first moment argument. Let i = 2^ '^ and // = 36k^ /2^. Then probability of the event 
described above is bounded by 



as claimed. 
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Lemma 5.7 W.h.p. we have \U*\<n- 0(7.2"'=). 

Proof. This is based on a branching process argument. More precisely, we consider the following stochastic 
process. At each time, a vertex can be either alive, neutral, or dead. Initially, all vertices in Uq are dead, all 
vertices in Ui are alive, and all other vertices are neutral. In each round of the process an alive vertex a is 
chosen arbitrarily (once there is no alive vertex left, the process stops). Every neutral vertex v such that all 
edges e with u G e contain either a or a dead vertex is declai^ed alive, and then a is declai^ed dead. 



Let At be the set of alive vertices after t steps of the process (in particular, Aq = Ui). Let T* = \Aq\, 
T* = 2n ■ 7.2"'^, and let T be the actual stopping time of the process. The goal is to show that w.h.p. 

T <n + T*, 

which impUes that U*\Ui<T*. 

To prove this bound, we proceed as follows. Consider a time t < T^: + T*. There are several ways in 
which a neutral vertex v can become alive. 

Case 1: s{v) = 1. By Lemma [5^ the total number of such vertices is bounded by (1 + o(l))Aexp(— A)n 
w.h.p. Moreover, v can become alive only if the unique clause that v supports contains a. The proba- 
bility of this event is bounded by 2k /n. Hence, the expected number of new alive vertices that arise 
in this way is < (1 + o(l))2fcAexp(— A). 

Case 2: s{v) > 1 and v has a dead neighbor. By Lemma lS^ and our assumption on t, the total number of 
vertices with a dead neighbor is bounded by 36fc^2~^n. If v is declared alive at time t, then all edges 
that contain v but no dead vertex must contain v, and there is at least one such edge. The probability 
of this event is bounded by 2k/n. Hence, the expected number of vertices that become alive in this 

way is < (1 + o(l))72A;''2-''. 

Case 3: s{v) > 1 and v does not have a dead neighbor. In this case all s{v) > 2 edges that v supports 
contain a. The probability of this event is 0(n~^). Hence, the expected number of vertices that 
become alive in this way is o(l). 

Thus, conditioning on the previous history J^t-i of the process, we obtain 

E[At-At^i\Tt-i]<ky2\ 

Furthermore, for all neutral v the events that v is activated at time t given Tt-i are mutually independent. 
Hence, At — At-i given Tt~i is stochastically dominated by a binomial variable Bt with mean k^ /2''. Now, 
if T > T* + T*, then at least T* vertices got activated by time T* + T*, i.e., Y^=t'^* Bt>T*. Since 

Tt,+T* 

E ^ Bt<{T^+ T*)k^/2^ < T*/2, 
t=i 

the Chernoff bound from Lemma O shows that P E£|^* ^t > T* < exp(-fi(n)). □ 

Proof of Proposition 15.51 The above discussion allows us to get a close understanding of the combinatorial 
structure of the hypergraph Hu. By Lemma ISTT] we have \U*\ < n ■ 0{7.2^^). Let E* be the set of 
all edges supported by a vertex in U* that contain a vertex in Uq U Ui. Lemma |5?T] implies that w.h.p. 
\E*\ < 0{k) \U*\ < n- 0(7.19"''). Hence, the set Ul of all vertices u G UqUUi that occur in an edge 
from E* has size \Ul\ < n- 0(7. 18"'^) w.h.p. Furthermore, let C/* be the set of all vertices v ^UqVJUi such 
that either f G C/* or there is an edge e supported by a vertex in Ui that contains v and another vertex from 
UqVJUi, or such that v ^Uq occurs in an edge supported by a vertex w ^ U'^r\Ui. Then by Lemma 1575) we 
have 1^7*1 < nO{7.2~^). 

In summary, we have shown that Hu has the following structure w.h.p. 

• The set Uq of non-supporting vaiiables has size (1 + o(l))?iexp(— A). 

• There is a set Ui \ C/* of size 

(1 + o(l))n [a(A; - 1) exp(-2A) + 0(7.17"'') 

such that in Hjj all vertices in Ui \ [/* support exactly one edge that contains precisely one other 
vertex from U, which indeed belongs to Uq. 



• Apart from these, Hu contains no more than nO{7.17 ^) further edges. 
This completes the proof of Proposition 15.31 □ 

5.3 Proof of Proposition 15.41 

As a first step, we will identify a large set of rigid vertices. To this end, we need to say something about the 
number of edges that the vertices inV \U support. Let / = 10. 

Lemma 5.8 W.h.p. the number of vertices v ^ U that support fewer than I edges that do not contain a 
vertex from U is bounded by ^ exp(— A)n. 

Proof. By Lemma W2\ the total number of vertices that support fewer than / edges is < ^^^]p- exp(— A)n 
w.h.p. Moreover, applying Lemma [5^ to the set U, we see that no more than 36A;^n/2^ < 0.9^ exp(— A)n 
critical edges supported by a vertex inV \U contain a vertex from U w.h.p. Each of these edges can create 
at most one additional vertex mV \U that supports fewer than I edges without a vertex from U. □ 

For each v [/ let s'{v) be the number of edges supported by v that do not contain a vertex from U. 
By the construction of U, we have s'{v) > 1 for all v ^ U. Furthermore, given the sequence {s'{v))y^v\u^ 
the distribution of the sub-hypergraph of Hi induced on 1/ \ [/ is very simple: it is obtained by choosing, 
for each vertex v (^ V \U independently, s'{v) edges supported by v and containing a random set of A; — 1 
vertices from V \U of color 1 — aiv). This follows because the construction of the set U merely imposes 
the condition that none of the s'{v) remaining edges supported by v contains a vertex from U. 

We now decompose the random edges of the sub-hypergraph Hi — U into two portions. The first portion 
A4 contains for each vertex ii o?ie random edge supported by v and containing k—1 vertices of color l — a{v) 
(with no vertex from U, of course). The second portion T-L contains the remaining s'(t;) — 1 > random 
edges supported by v and containing k—1 vertices of color 1 — a{v) (again, none of them from U). This 
decomposition will allow us to construct the desired set R in two independent steps. 

The first step is in to find a 'core' in the hypergraph Ti. 

CRl. Initially, let S contain all t; G 1/ that support at least 1/2 edges. 

CR2. While there is v £ S that supports < 1/2 edges consisting of vertices of S only, remove v from S. 

Let C = S* be the final outcome of this process. In order to study \C\, we need the following expansion 
property of the random hypergraph Hi. 

Lemma 5.9 W.h.p. the random hypergraph Hi has the following property. Let T C V be a set of size tn 
with t < l/{e^k\). Then there are no more than 2tn edges that are supported by a vertex in T and that 
contain a second vertex from T. 

Proof. We use a first moment argument. The probability that there is a set T that violates the above property 
is bounded by 
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Lemma 5.10 W.h.p. we have |V^ \ C| < A' exp(— A)n. 



Proof. Assume that |F \ C| > A' exp(— A)n. By Lemma [5^ we may assume that the initial set S contains 
at least n (l — 2A' exp(— A)/n) vertices. Hence, if jF \ C| > A' exp(— A)n, then at some point the process 
CR1-CR2 must have removed a set T of size A' exp(— A)n/2 from the original set S. This set T has the 
property that each vertex in T supports 1/2 > 2 edges, each of which must contain another vertex from T. 
But by Lemma [5^ no such set T exists w.h.p. □ 

Having constructed the set C, we are now going to 'attach' more vertices from y \ [/ to it via the 
following process. 

Al. Let Aq = C. 

A2. For t > 1, let ^( be the set of all vertices v € V \U such that either v G At-i or the edge e £ M. 
supported by v has its other k — 1 vertices in At^i. 

Let A = Ui^o -^*- Observe that actually A = An, i.e., the process becomes stationary after at most n 
steps. 

Lemma 5.11 W.h.p. the outcome of the above process satisfies \A\ = \V \U\ — o{n). 

Proof. Let At be the set constructed after t steps of the above process, with Ao = C and A-i = 0. Let T-Lt 
be the history of the process up to time t. Let v ^V \{U VJ At) be a vertex, and let e^ G 7W be the random 
edge supported by v. The only conditioning that T-Lt imposes on e^ is that e„ has at least one vertex w ^ v 
that does not lie in At-i. Hence, 

V[v^ At+i\nt] = P [e. \ {v} t At\nt\ <{k-\). ||r\^(^^uA^_^|')| - (38) 

To analyze the quantity on the right, let a^ = |^t| /|y \ U\ for t > —1. Then Lemma |5. lOl implies that 
w.h.p. flo ii 1 — -^' exp(— A). With this notation, ( [38] ) reads 

E [1 - at+i|7iij < . 

1 - aj_i 

Furthermore, given 'Kt, for all vertices ii G F \ ([/ U At) the events {v At^\\ are mutually independent 
(because each is determined by the edge e^, supported by v). Therefore, the number of t; G V \(IJ ^ At) 
such that V At-\-\ is stochastically dominated by a binomial distribution with mean |y \ C/| • ^ ~x~a ~ ■ 
By Chernoff bounds, with probability 1 — o(l/n) we therefore see that the number of v € V \ {U U At) 

such that V At+i is \V\U\- ^^"/^al"*''' + o{n). Hence, 



ik-l){l-at) 



2 



at+i < 1 - ^—^ ^ + o{l)\nt 

1 - at-i 



o(l/n), (39) 



and thus the above holds for all t > 1 w.h.p. 
Now, consider the (deterministic) recurrence 

^/ / xN (k-l)(l-at)^ 

ao = A^ exp -A , at+i = 1 - ^—^ —■ 

1 - ai„i 

It is straightforward to verify that limt_^.oo at = 1- Therefore, (l39l ) implies that w.h.p. 

lim \V\{UuAt)/n\ = 0, 



andthus |y \ (C/U^)| = o(ri) w.h.p. □ 

Proof of Proposition 15.41 We are left to show that w.h.p. all vertices in A are n/A;'^ -rigid. We start by proving 
that w.h.p. all vertices in C are n/A;^-rigid. Suppose that there is another 2-coloring r of C such that the set 

A = {u G C : a{v) / t{v)] 

has size < | A| < n/k^ . By the construction of C, each vertex u € A supports at least 3 edges that consist 
of vertices in C only. As these edges are bichromatic under r, each of them must contain a second vertex 
in A. Hence, there are at least 3 | A| edges that are supported by a vertex in A (under a) and that contain a 
second vertex in A. But Lemma [s!9] shows that w.h.p. there is no such set A of size < | A| < n/k'^. This 
shows that all vertices C are n/fc'^ -rigid w.h.p. 

Furthermore, the construction of ^ ensures that any 2-coloring r of Hi such that t{v) ^ a{v) for some 
u G ^ is indeed such that t{w) ^ a{w) for some w ^ C. This shows that any v G .4 is n/A;"^ -rigid w.h.p., 
because any it; G C is. □ 



6 A closer look at the internal entropy: proof of Corollary [L4 

6.1 Outline 

Throughout this section, we let /o(Ti) denote a function such that /o(ra) = o{n) as n — )• oo. Let a G {0, l}*^ 
be such that ||cr~^(0)| — |o"^^(l)|| < fo{n). In addition, let do be an equitable 2-coloring. To prove 
Corollary 1 1.41 we need to prove that the size |C((j)| of the local cluster in the planted model Hk{n, m, a) is 
tightly concentrated. To accomplish that, we need to study the set U from Section [5] That is, [/ C F is 
constructed as follows. 

1. Initially, let U consist of all vertices that do not support any edges. 

2. While there is a vertex v ^ U that does not support an edge that does not contain a vertex from U, 
add V to U. 

As a first step, we are going to show that \U\ is tightly concentrated. More precisely, in Section ld!2] we will 
prove the following. 

Proposition 6.1 For any two functions fo(n) = o{n), /i(n) = o(n) there is a function f2{n) = o{n) such 
that 

P [||C/| -E^^,(„,^,,„)|C/|| > /2(n)] < exp(-/i(n)). 

We also need the following simple expansion properties. 

Lemma 6.2 W.h.p. both Hi^{n, m) and Hj.{n, m, a) have the following property. 

For any set S <Z V of size \S\ < 2~^ n the number of edges e that contain at least two (40) 

vertices from S is bounded by 1.01|S'|. 

Furthermore, with probability 1 — exp(— r2(n)), Hi^^n, m, a) has the following property. 

For any set S C V of size 2~ n < \S\ < n/k^ the number of critical edges e that contain (41) 

at least two vertices from S is bounded by 1.{)1\S\. 



Proof. This follows from a simple first moment argument similar to the one in the proof of Lemma [521 '^ 
Using Proposition 16.11 and Lemma [6i2l we will derive the following in Section [631 



Proposition 6.3 Let Uk{n,m) = Ej|^^(„ „j,^g) lnC((7o). For any fo{n), fi{n) = o{n) there is a function 
fz{n) = o{n) such that 

^H,{n,m,a) [Wkin,m) - lnC((T)| > hin) and SB holds] < exp(-/i(n)). (42) 

Furthermore, for any 1 < j < n?'^ we have 

G<Uk (n, m) -Vk{n,m + j)= o{ti?'^). (43) 



Proof of Corollary \1.4\ To begin, let us fix a small e > 0. Our first goal is to show that there exists a density 
r = r{n) such that 

-^Hk{n,m,a) In |C(cr)| ~ - h\^Hk(n,m)Z - E. (44) 

To prove (l44l ). it is easier to work with the random hypergraph Hk{n, p, a) in which each e C T^ of size 
k that is bicolored under a is inserted with probability p independently. Then for any fixed n, the function 

is a polynomial in p. Furthermore, it is clear that F„(p) — )• In 2 as p — ;■ 0, and Fn{p) — )■ o(l) as p — ;• 1. 
For any p we let p{p) > be such that the expected number of edges in Hk{n,p, a) equals p{p)n. Then 
by the mean value theorem, there exists p such that Fn{p) ~ i lnEj|^^(,j rp(p)ni)-^ — £■ Since the acutal 
number of edges of Hk{n,p, a) is binomially distributed and therefore tightly concentrated about p{p)n, the 
'continuity property' (1431 ) ensures that 

1 1 

-E/f,(„,[p(p)„],<x) ln|C(CT)| ~ Fn{p) ~ -\D.¥.H^,{n,\p(p)n\)Z - E. 



Setting ri.{n) = p{p{n)), we obtain (l44l ). 

For this density r = r^{n) there exists a function fi{n) = o{n) such that 

ln(5'fe,n,m [B]) < ln{pk,n,m [B]) + /i(n) for any event S / 0. (45) 

Let fo{n) be such that with probability 1 — exp(— 2/i(n)), a random a G {0, 1}" satisfies ||cr~^(0)| — 
|c~^(l)|| < /o(^)- Combining Lemma [6!2l Proposition 16.31 (l44l) and (l45l) . we see that for these densities 
rs{n), w.h.p. a random pair {H, a) chosen from the Gibbs distribution is such that 

-ln\C{a)\ > -lnE[Z] -e> -lnZ{H)-2E. (46) 

n n n 

Since (l46l) holds w.h.p. for any fixed e > 0, there exist sequences e(n) — ;• 0, r{n) as desired. □ 

6.2 Proof of Proposition 16.11 

We are going to trace the process for the construction of the set U via the method of differential equa- 
tions |21 1. To obtain sufficient concentration from this approach, we will have to modify the process slightly. 
The modified process will yield a subset [/* C U, whose size is tightly concentrated. We will then see how 
[/* can be enhanced to a superset U* D U, whose size does not exceed the size of f/^, significantly with a 
very high probability. 

Our construction of C/* comes with a parameter u > ujq, where loq denotes a large constant (later we will 
let w — ;■ oo slowly as n — )• oo). To construct C/^,, we consider a similar process as in the proof of Lemma 15^ 
but we only run this process on the set V' of vertices that support at most w clauses. In each step, any vertex 
w £ V is either alive, dead, or neutral. Initially, all vertices in V' that do not support a clause are alive, and 
all others are neutral. The process stops once there is no alive vertex left. In each step, an alive vertex v is 
chosen randomly. Let dy be the number edges ei, . . . , e^^ supported by neuti"al vertices in which v occurs. 



Case 1: d^ < co. All of ei, . . . , e^^ are deleted from the hypergraph. 

Case 2: dv > to. In this case to edges amongst ei, . . . , e^^ are chosen randomly and are deleted from the 
hypergraph. Moreover, the remaining dy — u edges are changed as follows. Suppose that the deleted 
edges are ei , . . . , e^j. Then v is replaced in each edge e G {e^+i , . . . , e^^ } independently by a random 
vertex w ^ v with a{w) = a{v) that is not dead and that does not belong to e akeady; if there is no 
such vertex w left, the process stops. 

Finally, all neutral vertices that do not support an edge anymore (after the edge deletions described above) 
are declai^ed alive, and v is declared dead. Let T be the stopping time of the process, and let [/* be the set of 
dead vertices upon termination. Then \U^\ = T. 

The difference between the above process and the actual construction of U is that the latter runs on the 
entire set V (not just V) and that it always removes the ei, . . . , e^^. Therefore, [/* C U. 

To trace the construction of [/*, we need to define a few random variables. For each I < s < uj and each 
1 < / < s let Xt{s, I) denote the number of neutral vertices that support s vertices in total, out of which 
/ do not contain a vertex that has died by the end of step t. In addition, let At signify the number of alive 
vertices. Let {Tt)t>o be the filtration generated by the random variables Xt{s, I) and At. 

Let Vs be the number of vertices that support precisely s edges (s > 0). Moreover, let Vy^^ be the 
number of vertices that support more than uj edges. 

Lemma 6.4 We have 

Tl 

P ['D>cj > exp(— a;)n] < exp 
Furthermore, for any {) < s < lo we have 



2exp(2a;)r 



P \\Vs - EP^I > exp(-w2)nl < exp 



n 



2exp(2a;2)r 



Proof. For each vertex v the number s{v) of edges supported by v has a binomial distribution with mean 
A = kr/{2^~^ — 1). Assuming that {k and thus) A is sufficiently large, and choosing wq big enough, we see 
from Chernoff bounds that P [s{v) > w] < exp(— Scj). Hence, ED>^ < nexp(— Sw). Furthermore, I>>^ 
satisfies a Lipschitz condition: adding or removing a single edge can alter the value of Vy^ by at most one. 
Therefore, the first assertion follows from Azuma's inequality. Similarly, adding or removing a single edge 
can change the value of Vs by at most one, and thus Azuma's inequality also implies the second claim. □ 

Lemma 6.5 For any 1 < t < min JT, n//c^| we have 

E [Xt+i{s, l)\Ft] = Xt{s, I) (l - litlll) + Xt{s, l + l). (^ + l)(fe-^) + oUiJ) 

\ n — t J n — t 

E[At+i\Tt] =^E"=i^t(5,l) + o.(l) (48) 



Furthermore, 



with certainty. 



\Xt+i{s,X)-Xt{s,\)\<oj, \At+i-At\<uj (49) 



Proof. This is a standard argument for a differential equations analysis, based on the following observation 
('method of deferred decisions'): given the history Ft of the process up to time t, for each neutral vertex w 
each remaining edge e supported by w is conditioned only to the effect that e does not contain a vertex that 
has died by time t. Thus, the alive vertex v chosen at time t + 1 has a probability of 1 — (1 — l/(n — 1))^~^ ~ 



{k — l)/(n — t) of occurring in each edge supported by a neutral vertex, and these events are independent 
for all such edges. Furthermore, since each neutral vertex only supports < uj edges (as we confine ourselves 
to the set V), the probability that v occurs in two such edges is o(l). 

This means that given Tt the expected number of vertices that support s edges in total, out of which I 
are left after time t, and which support an edge in which v occurs, equals 

Xt{s,l)^-^^^ + o{l). (50) 

n — t 

Furthermore, the given J"j expected number of vertices that support s edges in total with I + 1 left after time 
t amongst which precisely one contains v, is 

Ms,l + l) ^^^^^^\-^K oil). (51) 

n — t 

To obtain (l47l ) from this, we need to take into account the 'exceptional' case 2 of the process. But since 
the expected number of occuiTcnces of v given Tt is bounded by \n/{n — t) < 2 A, and since this number 
is binomially distributed, the probability that v occurs in more than uj edges supported by neutral vertices is 
bounded by exp(— w). This estimate in combination with (l50l ) and (ISTI ) yields WT\ . Equation (l48l ) follows 
from a similar argument, and (l49l) is immediate from the construction. □ 

Corollary 6.6 There exists a number < ^ = ^{k, r) < 2riexp(— A) and a function 5^ = o^(l) such that 

n 



P [\T/n - /i| < <5a;] > 1 - exp 



exp{u}'^) 



(52) 



Proof. Lemma \6A\ and Lemma [63] verify the assumptions of 11211 Theorem 5.1] for times t < n/k'^. 
Furthermore, Proposition 15.31 shows that T < 2nexp(— A) < n/k^ w.h.p. Therefore, we can apply 11211 
Theorem 5. 1] to obtain §^. □ 



Remark 6.7 The 'method of differential equations' 1^1] Theorem 5.1] actually shows that the random vari- 
ables Xt{s, I) closely trace a system of ordinary differential equations. From these the number fj, in Corol- 
larv \6.6\ could. in principle, be worked out precisely for any given k, r. However, for our purposes it is not 
important to know jj, precisely. In the proof of Corollary 16. 61 it is important to use the differential equations 
approach as in / I2il/ to ensure sufficient concentration. 

Since | C/^, | = T and [/* C U, Corollary l6.6l provides a lower bound on the size of U. As a next step, we 
will derive an (asymptotically) matching upper bound. 

Lemma 6.8 There is a function 5^ = 0(^(1) such that 

V[\U\U^\ >5^n] <3exp 



n 



exp(a;'^ 



Proof. We use a similar argument as in the proof of Lemma Wn\ Namely, having constructed [/*, we 
commence a second process. Again, in the course of this process vertices can be alive, dead, or neutral. 
Initially, all vertices in [/* are dead. Furthermore, a vertex v [/* is declared alive if either v ^ V \V' 
(i.e., V supports more than oj edges), or v supports an edge that contains a vertex that occurs in more than uj 
edges supported by other vertices. All other vertices are neutral. From this initial state, the process proceeds 
just like the construction of the set U . Namely, in each step an alive vertex v is chosen, unless there is none 
left, in which case the process stops. Then, all vertices w such that each edge e supported by w contains 



either w or a dead vertex are declared alive, and v dies. Clearly, the set of dead vertices of this process upon 
termination contains U. 

By Corollary 16. 61 we may assume that |J7*| < 2nexp(— A). Standard arguments (similar to the proof 

of Lemma [s!6l ) show that with probability > 1 — exp — ^^^^-r^jr the total number of vertices that are alive 

initially is Oui{l)n. Furthermore, using stochastic dominance as in the proof of Lemma 15.71 one can show 

D 



cxp(a)'^) 



that the above process will terminate after only Ouj{l)n steps with probability > 1 — exp 

Proof of Proposition I6.il Corollary 16.61 and Lemma 16.81 show that there exist /i > and for any to > loq 
some (5(j > such that 

P [\\U\ - fi7i\ > 6^71] < exp(-n/exp(a;4)), (53) 

where 5^^ — )• as (x? — )• oo. We may assume that the given function /i(n) = o{n) satisfies /i(n) > ^/n, 
and that 5^^ > l/uj. Given such a /i(n), we can choose a slowly growing function ui = a;(n) < In Inn 
such that exp(— n/exp(ti;^)) < exp(— /i(n)). Then (l53l) implies that |E|C/| — ^n\ < 2(5^(„). Thus, setting 
/2("^) = '^Sid{n) and invoking (1531 ) once more completes the proof. □ 

6.3 Proof of Proposition 16.31 



To prove Proposition [631 it will be easier to work with a slightly different distribution over the hypergraphs 
that for which a is a 2-coloring. Namely, Hk{n,p,a) denote a random hypergraph obtained by including 
each possible edge that is 2-colored under a with probability p independently. Throughout this section, we 
fix p so that the expected number of edges of Hig{n,p, a) is equal to m. Due to our assumptions on a, this 

means that p ~ m/((l - 21"'=) (^)). 

Lemma 6.9 For any event E, P [Hk{n, m, a) G E] < 0{y/n)F [Hk{n,p, a) E E] . 

Proof. In Hj^{n,p) the total number of edges has a binomial distribution with mean m. Therefore, the 
probability that Hk{n,p) has exactly m edges is Q{l/^/rn) = Q{l/^/n). Furthermore, given that its total 
number of edges is m, Hk{n,p) is uniformly distributed over all such hypergraphs for which a is a 2- 
coloring. □ 

The argument for the proof of Proposition 16.31 basically is as follows. We will see that (essentially) all 
vertices mV\U aie rigid, and thus the entropy of the local cluster stems solely from variations of the colors 
in U. Furthermore, the hypergraph induced on U by the edges do not already contain two vertices from 
V \U with different colors is sub-critical, i.e., it decomposes into small (at most Inn but mostly constant- 
sized) connected components. Now, for each 'type' (i.e., isomorphism class) of component the number of 
occurrences of this type is tightly concentrated (similarly as in a subcritical random graph). This implies 
concentration of the total number of colorings onV \U because the total number is simply the sum of the 
numbers of colorings of the components. Let us now caiTy out the details. 

Consider the following way to construct a set C C F of vertices of Hk{n, m, a) (cf. Section [531 ). Let 
1 = 10. 

CI. Initially, let C contain all w G y that support at least 1/2 edges. 

C2. While there is u G C that supports < 1/2 edges consisting of vertices of C only, remove v from C. 

C3. While there is a vertex v ^ V \C that supports an edge e such that e \ {v} C C, add v to C. 

Proposition 6.10 For any function gi{n) = o{n) there is a function g2{n) = o{n) such that with probability 
> 1 — exp(— (7i(n)) the set C has the following properties. 



1. \V\C\ = \U\+g2{n). 

2. Either d40D is violated, or any 2-coloring t G C{a) satisfies t{v) = a{v) for all v & C. 

Proof. The same arguments as in Section [531 apply. □ 

For a set C C y let A{C) be the set of all e C V, \e\ = k, that have neither of the following two 
properties. 

1. e C C. 

2. There is a color i £ {0, 1} such that |e n cr^^(z)| = 1 and |e n a^^{l - i)nC\ = k - 1. (In other 
words, e is critical with respect to a and has k — 1 vertices, not including the supporting one, in C.) 

The reason why it is easier, for the present context, to work with the Hi^^n, p, a) model is the following 
simple observation. If we condition on the outcome C C V of the process C1-C3, each e G -4.(C) is present 
as an edge in Hk{n,p, a) with probability p independently. That is, the distribution of Hk{n,p, a) outside 
the 'core' C can be captured very easily. 

Given the outcome C of the process C1-C3, let Hk{n,p, a,C) denote the random hypergraph onV \C 
in which we include the set e \ C for each edge e of Hk{n, m, a) such that |e \ C| > 2. 

Lemma 6.11 Suppose \V\C\ < 3exp(— A)n. W.h.p. all connected components of Hk{n,p, a, C) have size 
O(lnn). Furthermore, for any uj = uj{n) — )■ oo the expected number of vertices of Hki'n',p,(J,C) that 
belong to components of size at least uj is bounded by exp(—Q(uj))n. 

Proof. Given the above observation, the assertion is a direct consequence of the result on the 'giant compo- 
nent' phase transition in random non-uniform hypergraphs from |[20l . □ 

Let T be the set of all equivalence classes with respect to isomorphism of hypergraphs with edges of 
size < k. An isolated copy of T G T in Hk{n,p, a, C) is a subset S C V \C such that 5 is a component 
of Hk{n,p, a, C) and such that the sub-hypergraph induced on S is isomorphic to T. Let l^.c signify the 
number of isolated copies of T G T in Hk {n,p,a,C) (given the set C). 

Lemma 6.12 For any T £ T and any d > Owe have 

P [\Yt,c - EYt,c\ >d]< exp ("Y^) • (54) 

Furthermore, if\V \C\ < 3exp(— A)n, then for any lo = Lo{n) — t- oo we have 

^ \ViT)\ ■ E [Yt,c] > (1 - expi-niuj)))n. 

T£T:\V(T)\<uj 

Proof The random variable Yr satisfies a Lipschitz condition: either adding or removing an edge to/from 
Hk{n,p, a, C) can change Yt by at most k. Therefore, the first assertion follows from Azuma's inequality. 
The second one is an immediate consequence of ( [54] ) and Lemma [6.11| □ 

We will now drop the conditioning upon the outcome of the process C1-C3. That is, we let Hk{n,p, a) 
be the random hypergraph obtained by first constructing C in Hj.{n, p, a) and then performing the construc- 
tion of Hk{n, p, a, C). For each T G T let Yr be the number of isolated copies of T in Hk{n, p, a). 

Corollary 6.13 For any function gi{n) = o{n) and any u = uj{n) — t- oo there exists g2{n) = o{n) such 
that the following is true. For each T £ T there is a number yx = yxik, r) > such that with probability 
> 1 — exp(— (7i(n)) either f l?Ql ) is violated or the following is true. 



1. All but g2{n) vertices ofHk{n,p, a) belong to a component on < lo vertices. 

2. We have ZreT 1^(^)1 " 1^^ " VT^l < 2g2{n). 

Proof. This is immediate from Proposition |6?T0](which, crucially, shows that |C| is tightly concentrated) and 
Lemma 16.121 □ 

For each T G T let zt denote the number of 2-colorings of T. Furthermore, let Z{Hk{n,p, a)) denote 
the number of 2-colorings of Hk{n,p, a). 

Corollary 6.14 For any function gi{n) = o{n) there exists g2{n) = o(l) such that with probability > 
1 — exp(— (7i(?z)) either f l?Ol) is violated or we have 



- In Z{Hk{n,p, a)) - ^ yxZT 



< g2{n). 



Proof of Proposition \6. 3\ Let us first deal with the random hypergraph i:ffc(n,p, cr). Suppose that |C| > n(l — 
2 exp(— A)). Then any 2-coloring r of Hj^{n,p, a, C) yields an element of the local cluster C//^(„ p o-,c)(<^) 
of iJfc(n,p, a, C) by letting t{v) = a{v) for all v £ C. Therefore, Proposition [6?T0] and Corollarv l6. 14l imply 
that 



^Hkin,p,a) 



- In \C{a)\ > ^ yrZT - o(l) 
Ter 



>l-exp(-/i(n)). (55) 



Conversely, consider a coloring r of Hk{n,p, a). By Proposition 16. 10[ either ( |40l ) is violated or t{v) = 
a{v) for all v £ C. Assume that the latter is true. Then r induces a 2-coloring of Hk{n,p, a, C). 

- lnC/^fe(„,p,CT)('7) < Yl yT^T + o(l). (56) 

TeT 



Hence, 



^Hk(n,p,a) 



either (HOll is violated or - In \C(a)\ < \^ yxZT + o(l) 



TeT 



>l-exp(-/i(n)). (57) 



Combining (1551 ) and (|57] ) with Lemma [6]9j we obtain (l42l) . 

Finally, to obtain (l43l ). observe that adding a further of n"^'^ edges to Hk{n, m, a) will simply can just 
connect at most n^/^ components of Hi:{n,p,a,C) with the set C. Lemma [6.111 shows that w.h.p. all of 
these components have size < n^'^^. Hence, w.h.p. the total reduction in the number of 2-colorings is 

< n2/3+o.oi = o{n^/^). D 
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